SlideShare a Scribd company logo
Blind source separation based on
independent low-rank matrix
analysis and its extensions
Ohio State University Visiting
December 15th, 2017
The University of Tokyo, Japan
Project Research Associate
Daichi Kitamura
• Name: Daichi Kitamura
• Age: 27 (born in 1990)
– Born in Kagawa in Japan
• Background:
– NAIST, Japan
• Master degree (received in 2014)
– SOKENDAI, Japan
• Ph.D. degree (received in 2017)
– The University of Tokyo, Japan
• Project Research Associate
• Research topics
– Acoustic signal processing, statistical signal processing,
audio source separation, etc.
Self introduction
2
Japan
Kagawa
(place of birth)
Tokyo
(Univ. Tokyo)
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Theoretical extension of ILRMA for better optimization
• Conclusion
3
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Theoretical extension of ILRMA for better optimization
• Conclusion
4
• Blind source separation (BSS) for audio signals
– separates original audio sources
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
– can be available for many audio app.
• Consider only “determined” situation
Background
5
Recording mixture Separated guitar
BSS
Sources Observed Estimated
Mixing system Demixing system
# of mics
# of sources
• Basic theories and their evolution
History of BSS for audio signals
6
1994
1998
2013
1999
2012
Year
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA
Nonnegative matrix factorization (NMF)
Motivation of ILRMA
• Conventional BSS techniques based on ICA
–  Minimum distortion (linear demixing)
–  Relatively fast and stable optimization
• FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary
function technique [N. Ono+, 2010], [N. Ono, 2011]
–  Could not use “specific” assumption of sources
• Only assumes non-Gaussian p.d.f. for sources
–  Permutation problem is crucial and still difficult to solve
• IVA often fails causing a “block permutation problem” [Y. Liang+, 2012]
• Better to use a “specific source model” in TF domain
– Independent low-rank matrix analysis (ILRMA) employs
a low-rank property 7
: frequency bins
Observed
signal
Source signalsFrequency-wise mixing matrix
: time frames
Estimated
signal
Frequency-wise demixing matrix
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Theoretical extension of ILRMA for better optimization
• Conclusion
8
• Independent component analysis (ICA)[P. Comon, 1994]
– estimates without knowing
– Source model (scalar)
• is non-Gaussian and mutually independent
– Spatial model
• Mixing system is a time-invariant matrix
• Mixing system in audio signals
– Convolutive mixture with room reverberation
Related methods: ICA
9
Mixing
matrix
Demixing
matrix
Source model
Sources Observed Estimated
Spatial model
• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
– estimates frequency-wise demixing matrix
– Source model (scalar)
• is complex-valued,
non-Gaussian, and
mutually independent
– Spatial model
• Frequency-wise mixing
matrix is time-invariant
– Instantaneous mixture in each frequency band
– A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010]
• Permutation problem?
– Order of estimated signals cannot be determined by ICA
– Alignment of frequency-wise estimated signals is required
• Many permutation solvers were proposed
Related methods: FDICA
10
Spectrograms
ICA1
…
Frequencybin
Time frame
…
ICA2
ICA I
• FDICA requires signal alignment for all frequency
– Order of estimated signals cannot be determined by ICA*
Permutation problem
11
ICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Permutation
Solver
Estimated signal 1
Estimated signal 2
Time
*Signal scale also must be restored by applying a back-projection technique
Related methods: IVA
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
– extends ICA to multivariate probabilistic model to consider
sourcewise frequency vector as a vector variable
– Source model (vector)
• is multivariate, spherical, complex-valued, non-Gaussian, and
mutually independent
– Spatial model
• Mixing system is a time-invariant matrix (rank-1 spatial model) 12
…
…
Mixing matrix
…
…
…
Observed vector
Demixing matrix
Estimated vector
Multivariate non-
Gaussian dist.
Have higher-order
correlations
Permutation-free estimation of is achieved!
Source vector
• Spherical multivariate distribution[T. Kim+, 2007]
• Why spherical distribution?
– Frequency bands that have similar activations will be merged
together as one source avoid permutation problem
Higher-order correlation assumed in IVA
13
x1 and x2 are mutually independent
Spherical
Laplace dist.
Mutually
independent two
Laplace dist.s
x1 and x2 have higher-order correlation
Probability depends on
only the norm
• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
Comparison of source models
14
Observed
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Estimated
Demixing
matrix
Current
empirical dist.
Non-Gaussian
source dist.
STFT
Frequency
Time
Frequency
Time
Observed Estimated
Current
empirical dist.
STFT
Frequency
Time
Frequency
Time
Non-Gaussian
spherical
source dist.
Scalar r.v.s
Vector
(multivariate) r.v.s
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Mixture is close to Gaussian
signal because of CLT
Source obeys non-
Gaussian dist.
Mutually
independent
Demixing
matrix Mutually
independent
Related method: NMF
• Nonnegative matrix factorization (NMF) [D. D. Lee, 1999]
– Low-rank decomposition with nonnegative constraint
• Limited number of nonnegative bases and their coefficients
– Spectrogram is decomposed in acoustic signal processing
• Frequently appearing spectral patterns and their activations
15
Amplitude Amplitude
Nonnegative matrix
(power spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(time-varying gains)
Time
: # of freq. bins
: # of time frames
: # of bases
Time
Frequency
Frequency
• ISNMF[C. Févotte, 2009]
– can be decomposed using “stable property” of
• If we define ,
Related method: ISNMF
16
Equivalent Circularly symmetric complex Gaussian dist.
Complex-valued observed signal
Nonnegative variance
Variance is also decomposed!
• Power spectrogram corresponds to variances in TF
plane
Related method: ISNMF
17
Frequencybin
Time frame
: Power spectrogram
Small value of power
Large value of power
Complex Gaussian distribution with TF-varying variance
If we marginalize in terms of time or frequency, the distribution
becomes non-Gaussian even though each TF grid is defined in
Gaussian distribution
Grayscale shows the
value of variance
Comparison of low-rankness
18
Drums Guitar
Vocals Speech
• Low-rankness (simplicity of a matrix)
– can be measured by a cumulative singular value (CSV)
– Drums and guitar are quite low-rank
• Also, vocals and speech are to some extent low-rank
– Music spectrogram can be modeled by only few patterns
Comparison of low-rankness
19
95% line
7 29 Around 90
Number of bases
when CSV reaches 95%
(Spectrogram size is 1025 x1883)
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Theoretical extension of ILRMA for better optimization
• Conclusion
20
Extension of source model in IVA
• Source model in IVA
– has a frequency-uniform scale
• Spherical multivariate Laplace
• Higher-order correlation among frequency
– Equivalent to NMF with one flat basis
• Source model in ISNMF[C. Févotte+, 2009]
– NMF with arbitrary number of bases
• can represent complicated TF structures
– can learn “co-occurrence” structure
in TF domain for each source
• Low-rank co-occurrence is captured as the variance
– The source-wise structure can be
estimated by ISNMF 21
Frequency
Time
Frequency
Time
Replace the source model
assumed in ICA or IVA
• Source model in IVA
• Source model in ISNMF[C. Févotte+, 2009]
22
Frequency-uniform scale
Extension of source model in IVA
Zero-mean complex
Gaussian in each TF bin
Low-rank decomposition
with NMF
Spherical Laplace dist.
(bivariate case)
Frequency vector
(I-dimension)
Time-frequency-varying variance
Time-frequency matrix
(IJ-dimensional)
Replace the source model
assumed in ICA or IVA
• Negative log-likelihood in ILRMA
Cost function in ILRMA and partitioning function
23
All the variables can easily be
optimized by an alternative update
Update rules in ICA
Update rules in ISNMF
Estimated signal:
Cost function in ICA
(estimates demixing matrix)
Cost function in ISNMF
(estimates low-rank source model)
Replaced from IVA model
to ISNMF model
Update rules of ILRMA
• ML-based iterative update rules
– Update rule for is based on iterative projection [N. Ono, 2011]
– Update rules for NMF variables is based on MM algorithm
– Pseudo code is available at
• http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 24
Spatial model
(demixing matrix)
Source model
(NMF source model)
where
and is a one-hot vector
that has 1 at th element
Optimization process in ILRMA
• Demixing matrix and source model are alternatively
updated
– The precise modeling of low-rank TF structures will
improve the estimation accuracy of demixing matrix
25
Estimating
demixing matrix
Mixture
Separated
Source model
Update
NMF
NMF
Estimating
NMF variables
Comparison of source models
26
FDICA source model
Non-Gaussian scalar variable
IVA source model
Non-Gaussian vector variable
with higher-order correlation
ILRMA source model
Non-Gaussian matrix variable
with low-rank time-frequency
structure
Rank of TF matrix
of mixture
Rank of TF matrix
of each source
• Multichannel NMF[A. Ozerov+, 2010], [H. Sawada+, 2013]
Multichannel extension of NMF
27
Spatial covariances in
each time-frequency slot
Observed
multichannel signal
Spatial covariances
of each source Basis matrix Activation matrix
Spatial model Source model
Partitioning function
Spectral patterns
Gains
Spatial property of each source Timber patterns of all sources
Multichannel
vector
Instantaneous spatial covariance
Relationship b/w ILRMA and multichannel NMF
• Difference b/w ILRMA and multichannel NMF?
– Source distribution: complex Gaussian distribution (same)
– ILRMA assumes
– Multichannel NMF assumes full-rank spatial covariance
• Assumption: rank-1 spatial model
– Spatial covariance of each source is rank-1 matrix
– Equivalent to simultaneous mixing assumption
28
Sourcewise steering vector
,
Relationship b/w ILRMA and multichannel NMF
• Multichannel NMF with rank-1 spatial model
30
Substitute into the cost function
Transform the variables as
Relationship b/w MNMF, IVA, and ILRMA
• From multichannel NMF side,
– Rank-1 spatial model is introduced, transform the problem
from the estimation of mixing system to that of demixing
matrix
• From IVA side,
– Increase the number of spectral bases in source model
31
Source model
Spatialmodel
FlexibleLimited
FlexibleLimited
IVA
Multichannel
NMF
ILRMA
NMF source
model
Rank-1 spatial
model
Experimental evaluation
• Conditions
32
Source signals
Music signals obtained from SiSEC
Convolve impulse response, two microphones and two sources
Window length 512 ms of Hamming window
Shift length 128 ms (1/4 shift)
Number of bases 30 per each source
Evaluation score Improvement ot signal-to-distortion ratio (SDR)
2 m
Source 1
5.66 cm
50 50
Source 2
Impulse response E2A
(reverberation time: 300 ms)
• Ultimate NZ tour (Guitar and Synthesizer, 14s)
Result example
33
Poor
Good 20
15
10
5
0
SDRimprovement[dB] Guitar
Synth.
IVA Multichannel
NMF
ILRMA
• Ultimate NZ tour (Guitar and Synthesizer, 14s)
12
10
8
6
4
2
0
-2
SDRimprovement[dB]
4003002001000
Iteration steps
IVA
MNMF
ILRMA
ILRMA
Results: bearlin-roads
34
without Z
with Z
11.5 s
15.1 s 60.7 s
7647.3 s
Poor
Good
• Thurston’s pairwise comparison
– Speech separation and music separation tasks
– 10 males and 4 females
Subjective evaluation
35
1.6
1.2
0.8
0.4
0.0
-0.4
-0.8
-1.2
Subjectivescore
IVA Multichannel NMF ILRMA
Speech signals
Music signals
Demonstration: music source separation
• Music source separation
36
Guitar
Vocal
Keyboard
Guitar
Vocal
Keyboard
Source
separation
Pay attention to listen
three parts in the mixture
Another demo is available at http://d-kitamura.net/en/index_en.html
Best optimization balance?
• “Alternating update” of spatial model (ICA) and
source model (NMF) is used in ILRMA
– Sometimes the optimization in ILRMA is trapped into a poor
solution (local minimum)
• There may be exists the best optimization balance
b/w ICA and NMF models to avoid local minima 37
ICA (demixing matrix) NMF (low-rank source model)
Identity and
Randomized
NMF update ICA update
Controlling optimization speed
• How to control the optimization speed ensuring the
convergence of algorithm?
– Parametric majorization-equalization (ME) algorithm
– Apply parametric ME to NMF optimization to find the best
balance between ICA and NMF
• Find the best balance of optimization speeds
between NMF and ICA
38
Identity and
Randomized
NMF update ICA update
Becomes controllable
by parametric ME
Majorization-based optimization algorithm
• NMF optimization is based on a majorizer-based
algorithm (a.k.a. auxiliary function technique)
– Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000]
39
Majorization-based optimization algorithm
• NMF optimization is based on a majorizer-based
algorithm (a.k.a. auxiliary function technique)
– Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000]
40
Majorization-based optimization algorithm
• NMF optimization is based on a majorizer-based
algorithm (a.k.a. auxiliary function technique)
– Majorization-equalization (ME) algorithm [C. Févotte+, 2011]
41
Majorization-based optimization algorithm
• NMF optimization is based on a majorizer-based
algorithm (a.k.a. auxiliary function technique)
– Majorization-equalization (ME) algorithm [C. Févotte+, 2011]
42
Fast Slow
Majorization-based optimization algorithm
• NMF optimization is based on a majorizer-based
algorithm (a.k.a. auxiliary function technique)
– Parametric ME algorithm [Y. Mitsui+, 2017]
43
Parametric-ME-based NMF optimization
• Comparison of NMF update rules
– Update rules of basis matrix
– Only the exponent is different
– Optimization speed of NMF model can be controlled by
44
MM algorithm
ME algorithm
Parametric
ME algorithm
Parametric-ME-based ILRMA
• ILRMA of 2000 trials with various random seeds
45
ultimate_nz_tour
FastSlow
Parametric-ME-based ILRMA
• ILRMA of 2000 trials with various random seeds
46
another_dreamer-the_ones_we_love
FastSlow
Parametric-ME-based ILRMA
• Slower NMF optimization (small value of ) tends to
provide better results in ILRMA
– But, why? We don’t know!
• Conjecture
– In the beginning of ILRMA, NMF model is “random”
• Not believable
– The demixing matrix can be updated without source
model to some extent (because even IVA works well)
• Statistical independence between sources is very powerful
47
Independence-
based separation
Initialization
Precise modeling
of source structure
Improved
separation
Updated Updated Updated
Slowly updated Slowly updated Updated
Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Theoretical extension of ILRMA for better optimization
• Conclusion
48
Conclusion
• Independent low-rank matrix analysis (ILRMA)
– Permutation-free ICA-based blind source separation
– Assumption
• Statistical independence between sources
• Low-rank time-frequency structure of each source
– Equivalent to multichannel NMF
• when the mixing assumption is valid
• On going works!
– Relaxation of rank-1 spatial model
– Extension of source generative model
– Semi/full-supervised ILRMA, user-guided ILRMA
– and, collaboration of deep neural network…
• Independent deeply learned matrix analysis (IDLMA)
• Maybe submitted at next EUSIPCO…? 49
Conclusion
• Independent low-rank matrix analysis (ILRMA)
– will be published from Springer in March, 2018!
50
Audio Source Separation
(Signals and Communication
Technology) 1st ed. 2018 Edition
by Shoji Makino (Editor)
Daichi Kitamura, Nobutaka Ono,
Hiroshi Sawada, Hirokazu
Kameoka, and Hiroshi Saruwatari,
"Determined blind source
separation with independent low-
rank matrix analysis“
Search in Amazon.com!
Conclusion
• Independent low-rank matrix analysis (ILRMA)
– will be presented in ICASSP 2018 as a tutorial session!
• Title (tentative): Blind Audio Source Separation on
Tensor Representation
– Presenters: Hiroshi Sawada, Nobutaka Ono, Hirokazu
Kameoka, Daichi Kitamura
51
Thank you so much
for your attention!

More Related Content

What's hot

音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
Naoya Takahashi
 
Asj2017 3invited
Asj2017 3invitedAsj2017 3invited
Asj2017 3invited
SaruwatariLabUTokyo
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
ssuser849b73
 
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
Daichi Kitamura
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離
NU_I_TODALAB
 
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
Daichi Kitamura
 
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
Daichi Kitamura
 
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
Daichi Kitamura
 
独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展
Kitamura Laboratory
 
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
Daichi Kitamura
 
スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析
Kitamura Laboratory
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討
Kitamura Laboratory
 
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
Daichi Kitamura
 
調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離
調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離
調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離
Kitamura Laboratory
 
Canopy k-means using Hadoop
Canopy k-means using HadoopCanopy k-means using Hadoop
Canopy k-means using Hadoop
Anandha L Ranganathan
 
(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...
(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...
(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...
Yoshiki Yamamoto
 
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離奈良先端大 情報科学研究科
 
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
Shinnosuke Takamichi
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
Daichi Kitamura
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
NU_I_TODALAB
 

What's hot (20)

音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
 
Asj2017 3invited
Asj2017 3invitedAsj2017 3invited
Asj2017 3invited
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離
 
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
 
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
 
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
 
独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展独立低ランク行列分析に基づく音源分離とその発展
独立低ランク行列分析に基づく音源分離とその発展
 
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
 
スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析スペクトログラム無矛盾性に基づく独立低ランク行列分析
スペクトログラム無矛盾性に基づく独立低ランク行列分析
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討
 
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
 
調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離
調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離
調波打撃音分離の時間周波数マスクを用いた線形ブラインド音源分離
 
Canopy k-means using Hadoop
Canopy k-means using HadoopCanopy k-means using Hadoop
Canopy k-means using Hadoop
 
(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...
(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...
(ICML2020 K.Kato et al. fujitsu) Rate distortion optimization guided autoenco...
 
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
直交化及び距離最大化則条件を用いた教師あり非負値行列因子分解による音楽信号分離
 
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
 

Similar to Blind source separation based on independent low-rank matrix analysis and its extensions

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
Daichi Kitamura
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
Kitamura Laboratory
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Daichi Kitamura
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
Daichi Kitamura
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
Daichi Kitamura
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
SaruwatariLabUTokyo
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Daichi Kitamura
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Daichi Kitamura
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
Kitamura Laboratory
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
Kitamura Laboratory
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Daichi Kitamura
 
AMT overview
AMT overviewAMT overview
AMT overview
WarNik Chow
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...
Daichi Kitamura
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
奈良先端大 情報科学研究科
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016
SaruwatariLabUTokyo
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
Hiroki_Tanji
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...
Kitamura Laboratory
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
Daichi Kitamura
 
Lecture_1 (1).pptx
Lecture_1 (1).pptxLecture_1 (1).pptx
Lecture_1 (1).pptx
DavidHamxa
 
Digital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital Filters
Nelson Anand
 

Similar to Blind source separation based on independent low-rank matrix analysis and its extensions (20)

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
AMT overview
AMT overviewAMT overview
AMT overview
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
 
Lecture_1 (1).pptx
Lecture_1 (1).pptxLecture_1 (1).pptx
Lecture_1 (1).pptx
 
Digital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital Filters
 

More from Daichi Kitamura

スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
Daichi Kitamura
 
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Daichi Kitamura
 
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
Daichi Kitamura
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
Daichi Kitamura
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
Daichi Kitamura
 
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
Daichi Kitamura
 
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
Daichi Kitamura
 
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
Daichi Kitamura
 
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
Daichi Kitamura
 
Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...
Daichi Kitamura
 
Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...
Daichi Kitamura
 
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
Daichi Kitamura
 
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
Daichi Kitamura
 
Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...
Daichi Kitamura
 
Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...
Daichi Kitamura
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
Daichi Kitamura
 

More from Daichi Kitamura (16)

スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
 
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
 
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
 
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
 
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
 
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
 
Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...
 
Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...
 
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
擬似ハムバッキングピックアップの弦振動応答 (in Japanese)
 
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
模擬ハムバッキング・ピックアップの弦振動応答 (in Japanese)
 
Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...Evaluation of separation accuracy for various real instruments based on super...
Evaluation of separation accuracy for various real instruments based on super...
 
Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...Divergence optimization based on trade-off between separation and extrapolati...
Divergence optimization based on trade-off between separation and extrapolati...
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
 

Recently uploaded

SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 

Recently uploaded (20)

SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 

Blind source separation based on independent low-rank matrix analysis and its extensions

  • 1. Blind source separation based on independent low-rank matrix analysis and its extensions Ohio State University Visiting December 15th, 2017 The University of Tokyo, Japan Project Research Associate Daichi Kitamura
  • 2. • Name: Daichi Kitamura • Age: 27 (born in 1990) – Born in Kagawa in Japan • Background: – NAIST, Japan • Master degree (received in 2014) – SOKENDAI, Japan • Ph.D. degree (received in 2017) – The University of Tokyo, Japan • Project Research Associate • Research topics – Acoustic signal processing, statistical signal processing, audio source separation, etc. Self introduction 2 Japan Kagawa (place of birth) Tokyo (Univ. Tokyo)
  • 3. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 3
  • 4. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 4
  • 5. • Blind source separation (BSS) for audio signals – separates original audio sources – does not require prior information of recording conditions • locations of mics and sources, room geometry, timbres, etc. – can be available for many audio app. • Consider only “determined” situation Background 5 Recording mixture Separated guitar BSS Sources Observed Estimated Mixing system Demixing system # of mics # of sources
  • 6. • Basic theories and their evolution History of BSS for audio signals 6 1994 1998 2013 1999 2012 Year Many permutation solvers for FDICA Apply NMF to many tasks Generative models in NMF Many extensions of NMF Independent component analysis (ICA) Frequency-domain ICA (FDICA) Itakura–Saito NMF (ISNMF) Independent vector analysis (IVA) Multichannel NMF Independent low-rank matrix analysis (ILRMA) *Depicting only popular methods 2016 2009 2006 2011 Auxiliary-function-based IVA (AuxIVA) Time-varying Gaussian IVA Nonnegative matrix factorization (NMF)
  • 7. Motivation of ILRMA • Conventional BSS techniques based on ICA –  Minimum distortion (linear demixing) –  Relatively fast and stable optimization • FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary function technique [N. Ono+, 2010], [N. Ono, 2011] –  Could not use “specific” assumption of sources • Only assumes non-Gaussian p.d.f. for sources –  Permutation problem is crucial and still difficult to solve • IVA often fails causing a “block permutation problem” [Y. Liang+, 2012] • Better to use a “specific source model” in TF domain – Independent low-rank matrix analysis (ILRMA) employs a low-rank property 7 : frequency bins Observed signal Source signalsFrequency-wise mixing matrix : time frames Estimated signal Frequency-wise demixing matrix
  • 8. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 8
  • 9. • Independent component analysis (ICA)[P. Comon, 1994] – estimates without knowing – Source model (scalar) • is non-Gaussian and mutually independent – Spatial model • Mixing system is a time-invariant matrix • Mixing system in audio signals – Convolutive mixture with room reverberation Related methods: ICA 9 Mixing matrix Demixing matrix Source model Sources Observed Estimated Spatial model
  • 10. • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] – estimates frequency-wise demixing matrix – Source model (scalar) • is complex-valued, non-Gaussian, and mutually independent – Spatial model • Frequency-wise mixing matrix is time-invariant – Instantaneous mixture in each frequency band – A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010] • Permutation problem? – Order of estimated signals cannot be determined by ICA – Alignment of frequency-wise estimated signals is required • Many permutation solvers were proposed Related methods: FDICA 10 Spectrograms ICA1 … Frequencybin Time frame … ICA2 ICA I
  • 11. • FDICA requires signal alignment for all frequency – Order of estimated signals cannot be determined by ICA* Permutation problem 11 ICA All frequency components Source 1 Source 2 Observed 1 Observed 2 Permutation Solver Estimated signal 1 Estimated signal 2 Time *Signal scale also must be restored by applying a back-projection technique
  • 12. Related methods: IVA • Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006] – extends ICA to multivariate probabilistic model to consider sourcewise frequency vector as a vector variable – Source model (vector) • is multivariate, spherical, complex-valued, non-Gaussian, and mutually independent – Spatial model • Mixing system is a time-invariant matrix (rank-1 spatial model) 12 … … Mixing matrix … … … Observed vector Demixing matrix Estimated vector Multivariate non- Gaussian dist. Have higher-order correlations Permutation-free estimation of is achieved! Source vector
  • 13. • Spherical multivariate distribution[T. Kim+, 2007] • Why spherical distribution? – Frequency bands that have similar activations will be merged together as one source avoid permutation problem Higher-order correlation assumed in IVA 13 x1 and x2 are mutually independent Spherical Laplace dist. Mutually independent two Laplace dist.s x1 and x2 have higher-order correlation Probability depends on only the norm
  • 14. • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] • Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006] Comparison of source models 14 Observed Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Estimated Demixing matrix Current empirical dist. Non-Gaussian source dist. STFT Frequency Time Frequency Time Observed Estimated Current empirical dist. STFT Frequency Time Frequency Time Non-Gaussian spherical source dist. Scalar r.v.s Vector (multivariate) r.v.s Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Mixture is close to Gaussian signal because of CLT Source obeys non- Gaussian dist. Mutually independent Demixing matrix Mutually independent
  • 15. Related method: NMF • Nonnegative matrix factorization (NMF) [D. D. Lee, 1999] – Low-rank decomposition with nonnegative constraint • Limited number of nonnegative bases and their coefficients – Spectrogram is decomposed in acoustic signal processing • Frequently appearing spectral patterns and their activations 15 Amplitude Amplitude Nonnegative matrix (power spectrogram) Basis matrix (spectral patterns) Activation matrix (time-varying gains) Time : # of freq. bins : # of time frames : # of bases Time Frequency Frequency
  • 16. • ISNMF[C. Févotte, 2009] – can be decomposed using “stable property” of • If we define , Related method: ISNMF 16 Equivalent Circularly symmetric complex Gaussian dist. Complex-valued observed signal Nonnegative variance Variance is also decomposed!
  • 17. • Power spectrogram corresponds to variances in TF plane Related method: ISNMF 17 Frequencybin Time frame : Power spectrogram Small value of power Large value of power Complex Gaussian distribution with TF-varying variance If we marginalize in terms of time or frequency, the distribution becomes non-Gaussian even though each TF grid is defined in Gaussian distribution Grayscale shows the value of variance
  • 18. Comparison of low-rankness 18 Drums Guitar Vocals Speech
  • 19. • Low-rankness (simplicity of a matrix) – can be measured by a cumulative singular value (CSV) – Drums and guitar are quite low-rank • Also, vocals and speech are to some extent low-rank – Music spectrogram can be modeled by only few patterns Comparison of low-rankness 19 95% line 7 29 Around 90 Number of bases when CSV reaches 95% (Spectrogram size is 1025 x1883)
  • 20. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 20
  • 21. Extension of source model in IVA • Source model in IVA – has a frequency-uniform scale • Spherical multivariate Laplace • Higher-order correlation among frequency – Equivalent to NMF with one flat basis • Source model in ISNMF[C. Févotte+, 2009] – NMF with arbitrary number of bases • can represent complicated TF structures – can learn “co-occurrence” structure in TF domain for each source • Low-rank co-occurrence is captured as the variance – The source-wise structure can be estimated by ISNMF 21 Frequency Time Frequency Time Replace the source model assumed in ICA or IVA
  • 22. • Source model in IVA • Source model in ISNMF[C. Févotte+, 2009] 22 Frequency-uniform scale Extension of source model in IVA Zero-mean complex Gaussian in each TF bin Low-rank decomposition with NMF Spherical Laplace dist. (bivariate case) Frequency vector (I-dimension) Time-frequency-varying variance Time-frequency matrix (IJ-dimensional) Replace the source model assumed in ICA or IVA
  • 23. • Negative log-likelihood in ILRMA Cost function in ILRMA and partitioning function 23 All the variables can easily be optimized by an alternative update Update rules in ICA Update rules in ISNMF Estimated signal: Cost function in ICA (estimates demixing matrix) Cost function in ISNMF (estimates low-rank source model) Replaced from IVA model to ISNMF model
  • 24. Update rules of ILRMA • ML-based iterative update rules – Update rule for is based on iterative projection [N. Ono, 2011] – Update rules for NMF variables is based on MM algorithm – Pseudo code is available at • http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 24 Spatial model (demixing matrix) Source model (NMF source model) where and is a one-hot vector that has 1 at th element
  • 25. Optimization process in ILRMA • Demixing matrix and source model are alternatively updated – The precise modeling of low-rank TF structures will improve the estimation accuracy of demixing matrix 25 Estimating demixing matrix Mixture Separated Source model Update NMF NMF Estimating NMF variables
  • 26. Comparison of source models 26 FDICA source model Non-Gaussian scalar variable IVA source model Non-Gaussian vector variable with higher-order correlation ILRMA source model Non-Gaussian matrix variable with low-rank time-frequency structure Rank of TF matrix of mixture Rank of TF matrix of each source
  • 27. • Multichannel NMF[A. Ozerov+, 2010], [H. Sawada+, 2013] Multichannel extension of NMF 27 Spatial covariances in each time-frequency slot Observed multichannel signal Spatial covariances of each source Basis matrix Activation matrix Spatial model Source model Partitioning function Spectral patterns Gains Spatial property of each source Timber patterns of all sources Multichannel vector Instantaneous spatial covariance
  • 28. Relationship b/w ILRMA and multichannel NMF • Difference b/w ILRMA and multichannel NMF? – Source distribution: complex Gaussian distribution (same) – ILRMA assumes – Multichannel NMF assumes full-rank spatial covariance • Assumption: rank-1 spatial model – Spatial covariance of each source is rank-1 matrix – Equivalent to simultaneous mixing assumption 28 Sourcewise steering vector ,
  • 29. Relationship b/w ILRMA and multichannel NMF • Multichannel NMF with rank-1 spatial model 30 Substitute into the cost function Transform the variables as
  • 30. Relationship b/w MNMF, IVA, and ILRMA • From multichannel NMF side, – Rank-1 spatial model is introduced, transform the problem from the estimation of mixing system to that of demixing matrix • From IVA side, – Increase the number of spectral bases in source model 31 Source model Spatialmodel FlexibleLimited FlexibleLimited IVA Multichannel NMF ILRMA NMF source model Rank-1 spatial model
  • 31. Experimental evaluation • Conditions 32 Source signals Music signals obtained from SiSEC Convolve impulse response, two microphones and two sources Window length 512 ms of Hamming window Shift length 128 ms (1/4 shift) Number of bases 30 per each source Evaluation score Improvement ot signal-to-distortion ratio (SDR) 2 m Source 1 5.66 cm 50 50 Source 2 Impulse response E2A (reverberation time: 300 ms)
  • 32. • Ultimate NZ tour (Guitar and Synthesizer, 14s) Result example 33 Poor Good 20 15 10 5 0 SDRimprovement[dB] Guitar Synth. IVA Multichannel NMF ILRMA
  • 33. • Ultimate NZ tour (Guitar and Synthesizer, 14s) 12 10 8 6 4 2 0 -2 SDRimprovement[dB] 4003002001000 Iteration steps IVA MNMF ILRMA ILRMA Results: bearlin-roads 34 without Z with Z 11.5 s 15.1 s 60.7 s 7647.3 s Poor Good
  • 34. • Thurston’s pairwise comparison – Speech separation and music separation tasks – 10 males and 4 females Subjective evaluation 35 1.6 1.2 0.8 0.4 0.0 -0.4 -0.8 -1.2 Subjectivescore IVA Multichannel NMF ILRMA Speech signals Music signals
  • 35. Demonstration: music source separation • Music source separation 36 Guitar Vocal Keyboard Guitar Vocal Keyboard Source separation Pay attention to listen three parts in the mixture Another demo is available at http://d-kitamura.net/en/index_en.html
  • 36. Best optimization balance? • “Alternating update” of spatial model (ICA) and source model (NMF) is used in ILRMA – Sometimes the optimization in ILRMA is trapped into a poor solution (local minimum) • There may be exists the best optimization balance b/w ICA and NMF models to avoid local minima 37 ICA (demixing matrix) NMF (low-rank source model) Identity and Randomized NMF update ICA update
  • 37. Controlling optimization speed • How to control the optimization speed ensuring the convergence of algorithm? – Parametric majorization-equalization (ME) algorithm – Apply parametric ME to NMF optimization to find the best balance between ICA and NMF • Find the best balance of optimization speeds between NMF and ICA 38 Identity and Randomized NMF update ICA update Becomes controllable by parametric ME
  • 38. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000] 39
  • 39. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000] 40
  • 40. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-equalization (ME) algorithm [C. Févotte+, 2011] 41
  • 41. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-equalization (ME) algorithm [C. Févotte+, 2011] 42 Fast Slow
  • 42. Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Parametric ME algorithm [Y. Mitsui+, 2017] 43
  • 43. Parametric-ME-based NMF optimization • Comparison of NMF update rules – Update rules of basis matrix – Only the exponent is different – Optimization speed of NMF model can be controlled by 44 MM algorithm ME algorithm Parametric ME algorithm
  • 44. Parametric-ME-based ILRMA • ILRMA of 2000 trials with various random seeds 45 ultimate_nz_tour FastSlow
  • 45. Parametric-ME-based ILRMA • ILRMA of 2000 trials with various random seeds 46 another_dreamer-the_ones_we_love FastSlow
  • 46. Parametric-ME-based ILRMA • Slower NMF optimization (small value of ) tends to provide better results in ILRMA – But, why? We don’t know! • Conjecture – In the beginning of ILRMA, NMF model is “random” • Not believable – The demixing matrix can be updated without source model to some extent (because even IVA works well) • Statistical independence between sources is very powerful 47 Independence- based separation Initialization Precise modeling of source structure Improved separation Updated Updated Updated Slowly updated Slowly updated Updated
  • 47. Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – Employ low-rank TF structures of each source in BSS – Gaussian source model with TF-varying variance – Relationship between ILRMA and multichannel NMF – Theoretical extension of ILRMA for better optimization • Conclusion 48
  • 48. Conclusion • Independent low-rank matrix analysis (ILRMA) – Permutation-free ICA-based blind source separation – Assumption • Statistical independence between sources • Low-rank time-frequency structure of each source – Equivalent to multichannel NMF • when the mixing assumption is valid • On going works! – Relaxation of rank-1 spatial model – Extension of source generative model – Semi/full-supervised ILRMA, user-guided ILRMA – and, collaboration of deep neural network… • Independent deeply learned matrix analysis (IDLMA) • Maybe submitted at next EUSIPCO…? 49
  • 49. Conclusion • Independent low-rank matrix analysis (ILRMA) – will be published from Springer in March, 2018! 50 Audio Source Separation (Signals and Communication Technology) 1st ed. 2018 Edition by Shoji Makino (Editor) Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, and Hiroshi Saruwatari, "Determined blind source separation with independent low- rank matrix analysis“ Search in Amazon.com!
  • 50. Conclusion • Independent low-rank matrix analysis (ILRMA) – will be presented in ICASSP 2018 as a tutorial session! • Title (tentative): Blind Audio Source Separation on Tensor Representation – Presenters: Hiroshi Sawada, Nobutaka Ono, Hirokazu Kameoka, Daichi Kitamura 51 Thank you so much for your attention!

Editor's Notes

  1. This talk treats blind source separation problem, BSS, which is a separation technique of individual sources from the recorded mixture. The word “blind” means that the method does not require any prior information about the recording conditions, such as locations of microphones, sources, and room geometry. This kind of technique is very useful for many applications as a system front-end processing. In this talk, we only consider a “determined” situation, namely, the numbers of microphones and sources are equal.
  2. This is a history of basic theories in audio BSS field. For acoustic signals, independent component analysis, ICA, was applied to the frequency domain signals as FDICA. After that, many permutation solvers for FDICA have been proposed, but eventually, an elegant solution, independent vector analysis, IVA was proposed. It is still extended to more flexible models. On the other hand, nonnegative matrix factorization, NMF, is also developed and extended to a multichannel signals for source separation problems. Recently, we have developed a new framework, which unifies these two powerful theories, called independent low-rank matrix analysis, ILRMA. I will explain about the detail.
  3. I here explain the motivation of this talk.
  4. I briefly explain the separation mechanism in FDICA and IVA. In FDICA, ICA is applied to each frequency bin considering the scalar time-series as random variables, and we maximize its non-Gaussianity to estimate the frequency-wise demixing matrix. In IVA, we consider a vector time-series variable of all frequencies like this figure, then assume a multivariate non-Gaussian distribution, which has a spherical property. Since spherical property ensures higher-order correlation in frequency variable, namely among frequency bins, the permutation problem can be avoided.
  5. This is a graphical image of the source model in ISNMF. In each time-frequency slot, zero-mean complex Gaussian distribution is defined, and they are mutually independent in all time and frequency. Now, the variances of these Gaussians are corresponding to the power spectrogram. Therefore, the slot that has strong power, such as a spectral peak and its harmonics, the Gaussian with a large variance becomes the generative model. Note that, even though each slot is Gaussian, the marginal distribution is non-Gaussian, because the variance fluctuates. So, we can use this model as a source model in ICA-based method.
  6. Again, the source generative model assumed in IVA is a spherical Laplace distribution, which has a frequency-uniform scale because it is “spherical.” And, the higher-order correlation is ensured by this uniform scale to avoid permutation problem. The scales in model can be represented as this figure, and as you can see, this model is equivalent to NMF with one flat spectral basis, a rank-1 matrix. Then, we replace this model to the source model in ISNMF. In ISNMF model, arbitrary number of bases can be used, and more complicated low-rank structure can be treated. So, the number of bases is extended from one to the arbitrarily number. The vector model is extended to the low-rank matrix. And this complicated time-frequency structure of each source can be easily estimated by ISNMF.
  7. Here is the detailed mathematical forms of each source model. IVA assumes frequency-vector variable, and its distribution is a spherical Laplace with the frequency-uniform scale. But now, it is replaced to ISNMF, which assumes time-frequency-matrix variable. In each time-frequency slot, zero-mean complex Gaussian is defined, and its variance can be changed among time and frequency. Here, r is the variance that has the indexes i and j, and the variance has low-rank time-frequency structure represented by the basis T and the activation V.
  8. ILRMAの反復更新式はこのように導出できます. 空間分離フィルタの更新と音源モデルの更新を交互に行うことで,全変数が最適化されます. これらの反復計算で尤度が単調増加することが保証されているので,初期値近傍の局所解への収束が保証されています.
  9. つまり,提案手法はまず空間分離フィルタを学習し,それで分離された信号の音色構造をNMFで学習,その結果得られる音源モデルを空間分離フィルタの学習に再利用し,さらに高精度な分離信号が得られる,という反復になります. このプロセスを何度も更新することで,音源毎の明確な音色構造が捉えられ,空間分離フィルタの性能向上が期待できます.
  10. This is a comparison of source models in FDICA, IVA, and ILRMA again. The important idea in ILRMA is that the rank of TF matrix of mixture signal is grater than the rank of TF matrix of each source. So, if we assume not only the independence between source but also a low-rank TF structure for each source, the separation will be done accurately.
  11. また,論文ではNMFの多チャネル信号への拡張手法である多チャネルNMFとILRMAが密接に関連しているという事実を明らかにしています. 簡単に説明いたしますと,従来の多チャネルNMFで定義されている空間情報に関するモデル「空間相関行列」のランクが1となる制約を課した場合とILRMAが等価となる,という事実です. ただし,多チャネルNMFは混合系を推定する手法であり,ILRMAやIVAのように分離系を推定する技術とは異なります.そのため,多チャネルNMFは計算効率や不安定性の観点から実用性にやや欠ける点があります.これに関しては比較実験で示します.
  12. Here we have a mixture signal of three parts. It’s just like a typical music. Please pay attention to listen three parts, guitar, vocal, and keyboard, OK? Let’s listen. Then, if we apply source separation, we can obtain this kind of signals. So, we can remix them, re-edit them, or anything we want. This is a source separation.
  13. OK, this is another issue, a kind of theoretical extension of ILRMA. In the previous experiment, we used an alternating update of spatial model, ICA, and source model, NMF. Again, this is the cost function of ILRMA that consists of ICA terms and NMF terms. To optimize this, first, the demixing matrix W and the source models TV are initialized as identity matrix and the randomized matrices. Then, after the initialization, which ever we can, but for example we update the NMF model T and V, then W is updated. And this process is iterated many times, that is, an alternating update. However, we often encounter the problem, namely, the separation result is sometimes really poor. The optimization in ILRMA is trapped into a poor solution that is a local minimum. So, we thought that, there may be exists the best optimization balance between ICA and NMF models, which can avoid such a local minimum. Therefore, the motivation of extension is that, we want to find a good solution, a good separation result by optimizing the cost function avoiding a poor local minimum during the optimization. To achieve this, we investigate the best optimization balance between ICA and NMF models, by focusing on the optimization speed of ICA and NMF.
  14. But how? How can we control the optimization speed ensuring the convergence of algorithm? What is the elegant way to control the speed of variable update? We here derived a new optimization algorithm called “parametric majorization-equalization (ME) algorithm,” where the new parameter is added, which controls the speed of optimization. Then, we apply this new parametric ME algorithm to the update of NMF variables. So, this block becomes controllable by the parametric ME. And, we find the best balance of optimization speed between NMF and ICA.
  15. All right. I explain about the details of the new optimization scheme. The typical optimization algorithm used in NMF is called majorization-minimization, MM algorithm. Or, sometimes it is called auxiliary function technique. Here we have an original function that we want to minimize. But it is difficult to find the solution in closed form, or even the calculation of its gradient is difficult, or sometimes impossible. For such a problem, MM algorithm is carried out as follows. First, we put the initial value, here.
  16. Then, we design an upper-bound function, which is a majorizing function, the red line. This upper-bound function comes in contact with the original function at the current parameter point, and become a majorizer at the other points. Also, the minimum point of the upper-bound function must be solvable in a closed form. For example, a quadratic function is a good candidate when the original function is convex. Then, by solving the minimum point of the upper-bound function, we can update the parameter from here, to here. In the next step, again we design the upper-bound function that comes in contact with the original function at the new parameter point, and do the same thing. By iterating these procedures, we can find the minimum point of the original function. This is an MM algorithm, Majorization, and minimization. The convergence is guaranteed because the monotonic decrease of the cost function value is guaranteed in every iteration. But, from another point of view, we can update the parameter to not the minimum point of the upper-bound function,
  17. but to the equalization point, from here, to here. This is called majorization-equalization algorithm, ME algorithm. Since the stepsize of the parameter update in ME algorithm is always larger than that of MM algorithm, it is basically faster than MM, and the convergence is still guaranteed. This fact means that, we can update the parameter to the any point in this range for ensuring the convergence of algorithm.
  18. Of course, the equalization point provides the fastest optimization, but the important point is that we can control the speed of parameter update without losing the theoretical convergence of algorithm.
  19. So, we design a further upper-bound function as blue lines with a parameter p. Where p is corresponding to the rate of stepsize. P=0.5 corresponds to the minimum point of red line, and when P=1, both the blue line and the red line coincides. We call this approach “parametric ME algorithm.”
  20. By using this algorithm to the update of NMF part in ILRMA, we can obtain the update rule. Here I listed the update rules with MM, ME, and parametric ME algorithms. They are the update rules of the basis matrix. The difference among them is just the exponent of the multiplicative coefficient. The fact is very simple and intuitive. And finally, the optimization speed of NMF model can be controlled by p parameter.
  21. And here, is the result of ILRMA with p parameter in NMF. We tried ILRMA 2000 times with various random initialization. Each blue point shows the separation performance. The horizontal axis indicates the p parameter, and the vertical axis is the separation performance. P=0.5 is the result of conventional ILRMA. We can clearly see the two clusters. This is the difference of local minima. If we use the faster update of NMF in ILRMA, the optimization tends to be trapped in the bad solution, but if we slow down the NMF update, the score reaches higher.
  22. This is the result of another song. In this data, we can see some local minima, but still, this area becomes dense, which means that the slower is better.
  23. Again, the slower NMF optimization tends to provide better results in ILRMA. But why? We don’t even know. But, this is our conjecture. In the beginning of ILRMA, NMF source model, TV, is random, and it is not believable. But, the demixing matrix W can be updated to the correct direction without the NMF source model to some extent. This is because, the statistical independence between sources is very powerful. So, in the beginning, the separation should be done without taking NMF model into account. Then, after some iterations, the NMF modeling catches up with the optimization of demixing matrix, and the precise modeling of source structures is done, and the separation is further improved. So, the optimization balance is important to avoid the local minima. As I said, this is a conjecture, and we must investigate about this issue, for example, using a toy model or something.
  24. The source model in IVA, spherical Laplace, was extended to this ISNMF model resulting in a independent low-rank matrix analysis (ILRMA). So, ILRMA is a unification of IVA and ISNMF, and we employed NMF source model to capture the low-rank time-frequency structures of each source. This source model can improve the estimation accuracy of the demixing matrix.
  25. 提案手法ILRMAの対数尤度関数はこのように得られます.ここで(クリック)青丸で囲った空間分離フィルタWと,赤丸で囲ったNMF音源モデルTVが求めるべき変数になります.(クリック) さらにこの式は,(クリック)前半が従来のIVAのコスト関数と等価であり,(クリック)後半が従来のNMFのコスト関数と等価です.(クリック) したがって,IVAとNMFの反復更新式を交互に反復することで全変数を容易に推定できます. さらに,音源毎に適切なランク数を潜在変数で適応的に決定することも可能です. これは,冒頭で示した通り,音楽信号といえどもボーカルはあまり低ランクにならず,ドラム信号は低ランク,といったことが起こりえますので,音源毎の適切なランクが変わります. そのような状況に対して尤度最大化の基準で自動的に基底を割り振るのがこの潜在変数の役割です.
  26. ILRMAの反復更新式はこのように導出できます. 空間分離フィルタの更新と音源モデルの更新を交互に行うことで,全変数が最適化されます. これらの反復計算で尤度が単調増加することが保証されているので,初期値近傍の局所解への収束が保証されています.
  27. 音楽信号の分離実験を行いました.こちらは実験条件です.二つの音楽信号をこのような配置で鳴らし,2チャンネルのマイクで録音しました.このときの残響時間は300msです. 評価値はSDRという値を用いています.これは音質と分離度合いを含む総合的な性能を示す尺度です.
  28. As I already explained, the window length in STFT affects the performance of ICA-based separation. If we use short window, x=As does not hold anymore, and if we use long window, the estimation becomes unstable because the number of time frames J decreases. However, ILRMA employs full time-frequency modeling of sources. This model may improve the robustness to a decrease in J. This is our expectation. Let’s check about this issue.
  29. Here we used 4 music and 4 speech signals obtained from SiSEC database, and we produced the observed signal by convolutiong the impulse response shown in the bottom. We used two types of impulse response, one has 300-ms-long reverberation, and the other one is 470 ms.
  30. We compare 4 methods, FDICA + ideal permutation solver, FDICA + DOA-based permutation solver, IVA, and ILRMA. In FDICA+IPS we used the reference, oracle source spectrogram. So this is an upper limit of FDICA. FDICA+DOA is a blind method that uses DOA clustering for solving the permutation problem. Of cause IVA and ILRMA are also blind method. Then, we used Hamming window with various window lengths.
  31. First, we show the results with ideal initialization. Namely, we first give a correct answer of demixing matrix using the oracle source. So, the initial value provides the best separation performance here. In addition, only for ILRMA, we set the initial value of NMF model T and V as the oracle values. Therefore, FDICA+DOA and IVA are using the spatial oracle initialization, and FDICA+IPS and ILRMA are using spatial and spectral oracle initialization.
  32. This is the result. The left ones are music, and right ones are the speech, and the reverberation time is short (top) and long (bottom). The horizontal axis shows the window length, and the vertical axis shows the separation performance. The colored lines are the results of ILRMA with various numbers of NMF bases. In the music results, we can see that FDICA and IVA could not achieve the good separation when the window becomes long. In ILRMA, the performance maintains even in a long long windows. This is obtained from the full modeling of time-frequency structure of each source. However, for the speech signals, the performance of ILRMA becomes worse. We guess this is because speech is not low-rank, and the source model could not capture the precise TF structures.
  33. Next, we show the results with fully blind situation. Initial W is set to identity matrix, and the initial source model is randomized. Note that FDICA+IPS still uses the oracle spectrogram for solving the permutation.
  34. This is the result. We could not obtain the same results as the previous one. The performance of all the methods is degraded when the window length becomes long. Therefore, at least we can say that, ILRMA has a good potential to separate the sources even in a long window case, but in practice, the blind estimation of precise source model is a difficult problem.
  35. This is a graphical image of the source model in ISNMF. In each time-frequency slot, zero-mean complex Gaussian distribution is defined, and they are mutually independent in all time and frequency. Now, the variances of these Gaussians are corresponding to the power spectrogram. Therefore, the slot that has strong power, such as a spectral peak and its harmonics, the Gaussian with a large variance becomes the generative model. Note that, even though each slot is Gaussian, the marginal distribution is non-Gaussian, because the variance fluctuates. So, we can use this model as a source model in ICA-based method.
  36. I do not explain about the detailed derivation of update rules, but they can easily be derived by the same manner as the previous Gaussian ILRMA.