SlideShare a Scribd company logo
1 of 30
Divergence optimization in nonnegative matrix
factorization with spectrogram restoration for
multichannel signal separation
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)
Yu Takahashi, Kazunobu Kondo
(Yamaha Corporation, Japan)
Hirokazu Kameoka
(The University of Tokyo, Japan)
4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays
Oral session 2 – Microphone array processing
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
2
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
3
Research background
• Signal separation have received much attention.
• Music signal separation based on nonnegative matrix
factorization (NMF) is a very active research area.
• Supervised NMF (SNMF) achieves the highest
separation performance.
• To improve its performance, SNMF-based
multichannel signal separation method is required.
4
• Automatic music transcription
• 3D audio system, etc.
Applications
Separate!
We have proposed a new SNMF and its hybrid
separation method for multichannel signals.
Research background
• Our proposed hybrid method
5
Input stereo signal
Spatial separation method
(Directional clustering)
SNMF-based separation method
(SNMF with spectrogram restoration)
Separated signal
L R
Research background
• Divergence criterion in SNMF strongly affects
separation performance.
– Euclidian distance (EUC-distance)
– Kullback-Leibler divergence (KL-divergence)
– Itakura-Saito divergence (IS-divergence)
• The optimal divergence for SNMF with spectrogram
restoration is not apparent.
6
We extend our new SNMF to a more generalized form.
We give a theoretical analysis for the optimization of
the divergence.
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– NMF
– Supervised NMF
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
7
Stereo signal
Spatial separation
Spectral separation
Separated signal
Hybrid method
Directional clustering [Araki, et al., 2007]
• Directional clustering
– Unsupervised spatial separation method
• Problems
– Cannot separate sources in the same direction
– Artificial distortion arises owing to the binary masking.
8
Right
L R
Center
Left
L R
Center
Binary masking
Input signal (stereo) Separated signal
1 1 1 0 0 0
1 0 0 0 0 0
1 1 1 1 0 0
1 0 0 0 0 0
1 1 1 1 1 1
Frequency
Time
C C C R L R
C L L L R R
C C C C R R
C R R L L L
C C C C C C
Frequency
Time
Binary maskSpectrogram
Entry-wise product
• NMF can extract significant spectral patterns.
– Basis matrix has frequently-appearing spectral patterns
in .
NMF [Lee, et al., 2001]
Amplitude
Amplitude
Observed matrix
(spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(Time-varying gain)
Time
Ω: Number of frequency bins
𝑇: Number of time frames
𝐾: Number of bases
Time
Frequency
Frequency
9
Basis
Divergence criterion in NMF
• Cost function in NMF
– Euclidian distance (EUC-distance)
– Kullback-Leibler divergence (KL-divergence)
– Itakura-Saito divergence (IS-divergence)
10
: Entries of variable matrices and , respectively.
• SNMF
– Supervised spectral separation method
Supervised NMF [Smaragdis, et al., 2007]
Separation process Optimize
Training process
Supervised basis matrix
(spectral dictionary)
Sample sounds
of target signal
11
Fixed
Sample sound
Target signal Other signalMixed signal
Hybrid method [Kitamura, et al., 2013]
• We have proposed a new SNMF called SNMF with
spectrogram restoration and its hybrid method.
12
Directional
clustering
L R
Spatial
separation
Spectral
separation
SNMF with
spectrogram restoration
Hybrid method
SNMF with spectrogram restoration
• SNMF with spectrogram restoration can separate the
target and restore the spectrogram simultaneously.
13
: Hole
Time
Frequency
Spectrogram after
directional clustering
Time
Frequency
After SNMF with
spectrogram restoration
Non-target
Target
Non-target
Target
Supervised bases
(Dictionary of the target)
• The divergence is defined at all grids except for the
holes by using the Binary mask matrix .
Decomposition model and cost function
14
Decomposition model:
Supervised bases (Fixed)
: Entries of matrices, , and , respectively
: Weighting parameters,: Binary complement, : Frobenius norm
Regularization term
Penalty term
Cost function:
: Binary masking matrix obtained from directional clustering
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
15
• : -divergence [Eguchi, et al., 2001]
– EUC-distance
– KL-divergence
– IS-divergence
Generalized divergence: b -divergence
16
• We introduced -divergence to extend the cost
function as a generalized form.
Decomposition model and cost function
17
Decomposition model:
Supervised bases (Fixed)Cost function:
Update rules
• We can obtain the update rules for the optimization of
the variables matrices , , and .
18
Update rules:
SNMF with spectrogram restoration
• This SNMF has two tasks.
• The optimal divergence for source separation has
been investigated.
– KL-divergence ( ) is suitable for source separation.
• No one investigates about the optimal divergence for
basis extrapolation.
• We analyze the optimal divergence for basis
extrapolation based on a generation model in NMF.
19
Source
separation
SNMF with
spectrogram restoration
Basis
extrapolation
• The decomposition of NMF is equivalent to a
maximum likelihood estimation, which assumes the
generation model of the input data , implicitly.
Analysis of extrapolation ability
20
Cost function in NMF:
Exponential dist. Poisson dist. Gaussian dist.
: Maximum of data
IS-divergence KL-divergence EUC-distance
• To compare net extrapolation ability, we generate a
random data , which obey each generation model.
• Also, we prepare the binary-masked random
data , and attempt to restore that.
Analysis of extrapolation ability
21
Restoration
100 bases is created.
Training
• Binary mask was randomly generated.
– We generate two types of binary mask whose densities of
holes are 75% and 98%.
• SAR indicates the accuracy of restoration
Analysis of extrapolation ability
22
Input random data Binary-masked data Restored data
Binary
masking
Restoration
[dB]
Entry-wise square
Results of restoration analysis
• Simulated result of the restoration ability
• The optimal divergence for the basis extrapolation
(restoration) is around !
23
25
20
15
10
5
0
SAR[dB]
43210
bNMF
25
20
15
10
5
0
SAR[dB]
43210
bNMF
breg=0
breg=1
breg=2
breg=3
breg=0
breg=1
breg=2
breg=3
Optimal divergence for source separation (KL-divergence)
Good
Bad
75%-binary-masked 98%-binary-masked
Trade-off between separation and restoration
• The optimal divergence for SNMF with spectrogram
restoration and its hybrid method is based on the
trade-off between separation and restoration abilities.
-10
-8
-6
-4
-2
0
Amplitude[dB]
543210
Frequency [kHz]
-10
-8
-6
-4
-2
0
Amplitude[dB]
543210
Frequency [kHz]
Sparseness: strong Sparseness: weak
24
Performance
Separation
Total performance of the hybrid method
Restoration
0 1 2 3 4
Outline
• 1. Research background
• 2. Conventional methods
– Directional clustering
– Nonnegative matrix factorization
– Supervised nonnegative matrix factorization
– Hybrid method
• 3. Analysis of restoration ability
– Generalized cost function
– Analysis based on generation model
• 4. Experiments
• 5. Conclusions
25
• Mixed signal includes four melodies (sources).
• Three compositions of instruments
– We evaluated the average score of 36 patterns.
Experimental condition
26
Center
1
2 3
4
Left Right
Target source
Supervision
signal
24 notes that cover all the notes in the target melody
Dataset Melody 1 Melody 2 Midrange Bass
No. 1 Oboe Flute Piano Trombone
No. 2 Trumpet Violin Harpsichord Fagotto
No. 3 Horn Clarinet Piano Cello
14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Signal-to-distortion ratio (SDR)
– total quality of the separation, which includes the degree of
separation and absence of artificial distortion.
Experimental result
27
Good
Bad
Conventional SNMF
Proposed hybrid method ( )
Directional
clustering
Multichannel
NMF [Sawada]
KL-divergence EUC-distance
Unsupervised
method
Supervised method
Multichannel NMF is an integrated method.
Experiment for real-recorded signal
• We recorded a binaural signal using dummy head
• Reverberation time:
– 200 ms
• The other conditions
are the same as
those in the previous
instantaneous mixture
signal.
28
1
Center
Right
4
2 3
Left
Dummy head
1.5 m 1.5 m
1.5 m
2.5 m
Target signal
14
12
10
8
6
4
2
0
SDR[dB]
43210
bNMF
• Result for real-recorded signals
Experimental result
29
Good
Bad
Conventional SNMF
Proposed hybrid method ( )
Unsupervised
method
Supervised method
Directional
clustering
Multichannel
NMF [Sawada]
KL-divergence EUC-distance
Multichannel NMF is an integrated method.
Conclusions
• Restoration requires anti-sparse criterion ( b = 3 )
• There is a trade-off between separation and
restoration abilities
• Optimal divergence is EUC-distance for SNMF
with spectrogram restoration
– whereas KL-divergence is the best for conventional
SNMF.
30
Thank you for your attention!

More Related Content

What's hot

Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
奈良先端大 情報科学研究科
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
奈良先端大 情報科学研究科
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
奈良先端大 情報科学研究科
 

What's hot (20)

Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 
Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016
 
Ica2016 312 saruwatari
Ica2016 312 saruwatariIca2016 312 saruwatari
Ica2016 312 saruwatari
 
Apsipa2016for ss
Apsipa2016for ssApsipa2016for ss
Apsipa2016for ss
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure models
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
 
Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
 
Dsp2015for ss
Dsp2015for ssDsp2015for ss
Dsp2015for ss
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
 

Similar to Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
WiMLDSMontreal
 
NON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXING
NON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXINGNON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXING
NON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXING
grssieee
 
Ijieh_Ebholo_Poster
Ijieh_Ebholo_PosterIjieh_Ebholo_Poster
Ijieh_Ebholo_Poster
Ebholo Ijieh
 
In it seminar_r_d_mos_cut
In it seminar_r_d_mos_cutIn it seminar_r_d_mos_cut
In it seminar_r_d_mos_cut
jpdacosta
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Shubham Verma
 
Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...
Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...
Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...
Alpen-Adria-Universität
 

Similar to Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation (20)

Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptxSPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
 
L046056365
L046056365L046056365
L046056365
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
 
NON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXING
NON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXINGNON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXING
NON-LINEAR FULLY-CONSTRAINED SPECTRAL UNMIXING
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
A multivariate approach for process variograms
A multivariate approach for process variogramsA multivariate approach for process variograms
A multivariate approach for process variograms
 
Cuantificacion de Amorfo por Difraccion de Rayos X
Cuantificacion de Amorfo por Difraccion de Rayos XCuantificacion de Amorfo por Difraccion de Rayos X
Cuantificacion de Amorfo por Difraccion de Rayos X
 
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODSPREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
 
Ijieh_Ebholo_Poster
Ijieh_Ebholo_PosterIjieh_Ebholo_Poster
Ijieh_Ebholo_Poster
 
In it seminar_r_d_mos_cut
In it seminar_r_d_mos_cutIn it seminar_r_d_mos_cut
In it seminar_r_d_mos_cut
 
Empirical Study of ANN Based Prediction of Resonant Frequency and Bandwidth o...
Empirical Study of ANN Based Prediction of Resonant Frequency and Bandwidth o...Empirical Study of ANN Based Prediction of Resonant Frequency and Bandwidth o...
Empirical Study of ANN Based Prediction of Resonant Frequency and Bandwidth o...
 
Recovery of low frequency Signals from noisy data using Ensembled Empirical M...
Recovery of low frequency Signals from noisy data using Ensembled Empirical M...Recovery of low frequency Signals from noisy data using Ensembled Empirical M...
Recovery of low frequency Signals from noisy data using Ensembled Empirical M...
 
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
 
time based ranging via uwb radios
time based ranging via uwb radiostime based ranging via uwb radios
time based ranging via uwb radios
 
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
M.sc. presentation t.bagheri fashkhami
M.sc. presentation t.bagheri fashkhamiM.sc. presentation t.bagheri fashkhami
M.sc. presentation t.bagheri fashkhami
 
Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...
Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...
Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Ne...
 

More from Daichi Kitamura

More from Daichi Kitamura (20)

独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
 
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
 
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法(Jupyter notebookも)
 
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
 
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
 
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...
 
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
 
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
非負値行列因子分解に基づくブラインド及び教師あり音楽音源分離の効果的最適化法
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
 
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
 
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
半教師あり非負値行列因子分解における音源分離性能向上のための効果的な基底学習法
 
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
独立性基準を用いた非負値行列因子分解の効果的な初期値決定法(Statistical-independence-based efficient initia...
 
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sou...
 
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
非負値行列分解の確率的生成モデルと 多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...
 
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...
 
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
基底変形型教師ありNMFによる実楽器信号分離 (in Japanese)
 
Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...Study on optimal divergence for superresolution-based supervised nonnegative ...
Study on optimal divergence for superresolution-based supervised nonnegative ...
 
Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...Music signal separation using supervised nonnegative matrix factorization wit...
Music signal separation using supervised nonnegative matrix factorization wit...
 

Recently uploaded

21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 
Degrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxDegrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptx
Mostafa Mahmoud
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Overview of Transformation in Computer Graphics
Overview of Transformation in Computer GraphicsOverview of Transformation in Computer Graphics
Overview of Transformation in Computer Graphics
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Danikor Product Catalog- Screw Feeder.pdf
Danikor Product Catalog- Screw Feeder.pdfDanikor Product Catalog- Screw Feeder.pdf
Danikor Product Catalog- Screw Feeder.pdf
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
Presentation on Slab, Beam, Column, and Foundation/Footing
Presentation on Slab,  Beam, Column, and Foundation/FootingPresentation on Slab,  Beam, Column, and Foundation/Footing
Presentation on Slab, Beam, Column, and Foundation/Footing
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
Study of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block DiagramStudy of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block Diagram
 
Degrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptxDegrees of freedom for the robots 1.pptx
Degrees of freedom for the robots 1.pptx
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

  • 1. Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan) 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays Oral session 2 – Microphone array processing
  • 2. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 2
  • 3. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 3
  • 4. Research background • Signal separation have received much attention. • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • Supervised NMF (SNMF) achieves the highest separation performance. • To improve its performance, SNMF-based multichannel signal separation method is required. 4 • Automatic music transcription • 3D audio system, etc. Applications Separate! We have proposed a new SNMF and its hybrid separation method for multichannel signals.
  • 5. Research background • Our proposed hybrid method 5 Input stereo signal Spatial separation method (Directional clustering) SNMF-based separation method (SNMF with spectrogram restoration) Separated signal L R
  • 6. Research background • Divergence criterion in SNMF strongly affects separation performance. – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) • The optimal divergence for SNMF with spectrogram restoration is not apparent. 6 We extend our new SNMF to a more generalized form. We give a theoretical analysis for the optimization of the divergence.
  • 7. Outline • 1. Research background • 2. Conventional methods – Directional clustering – NMF – Supervised NMF – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 7 Stereo signal Spatial separation Spectral separation Separated signal Hybrid method
  • 8. Directional clustering [Araki, et al., 2007] • Directional clustering – Unsupervised spatial separation method • Problems – Cannot separate sources in the same direction – Artificial distortion arises owing to the binary masking. 8 Right L R Center Left L R Center Binary masking Input signal (stereo) Separated signal 1 1 1 0 0 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 Frequency Time C C C R L R C L L L R R C C C C R R C R R L L L C C C C C C Frequency Time Binary maskSpectrogram Entry-wise product
  • 9. • NMF can extract significant spectral patterns. – Basis matrix has frequently-appearing spectral patterns in . NMF [Lee, et al., 2001] Amplitude Amplitude Observed matrix (spectrogram) Basis matrix (spectral patterns) Activation matrix (Time-varying gain) Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Time Frequency Frequency 9 Basis
  • 10. Divergence criterion in NMF • Cost function in NMF – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) 10 : Entries of variable matrices and , respectively.
  • 11. • SNMF – Supervised spectral separation method Supervised NMF [Smaragdis, et al., 2007] Separation process Optimize Training process Supervised basis matrix (spectral dictionary) Sample sounds of target signal 11 Fixed Sample sound Target signal Other signalMixed signal
  • 12. Hybrid method [Kitamura, et al., 2013] • We have proposed a new SNMF called SNMF with spectrogram restoration and its hybrid method. 12 Directional clustering L R Spatial separation Spectral separation SNMF with spectrogram restoration Hybrid method
  • 13. SNMF with spectrogram restoration • SNMF with spectrogram restoration can separate the target and restore the spectrogram simultaneously. 13 : Hole Time Frequency Spectrogram after directional clustering Time Frequency After SNMF with spectrogram restoration Non-target Target Non-target Target Supervised bases (Dictionary of the target)
  • 14. • The divergence is defined at all grids except for the holes by using the Binary mask matrix . Decomposition model and cost function 14 Decomposition model: Supervised bases (Fixed) : Entries of matrices, , and , respectively : Weighting parameters,: Binary complement, : Frobenius norm Regularization term Penalty term Cost function: : Binary masking matrix obtained from directional clustering
  • 15. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 15
  • 16. • : -divergence [Eguchi, et al., 2001] – EUC-distance – KL-divergence – IS-divergence Generalized divergence: b -divergence 16
  • 17. • We introduced -divergence to extend the cost function as a generalized form. Decomposition model and cost function 17 Decomposition model: Supervised bases (Fixed)Cost function:
  • 18. Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . 18 Update rules:
  • 19. SNMF with spectrogram restoration • This SNMF has two tasks. • The optimal divergence for source separation has been investigated. – KL-divergence ( ) is suitable for source separation. • No one investigates about the optimal divergence for basis extrapolation. • We analyze the optimal divergence for basis extrapolation based on a generation model in NMF. 19 Source separation SNMF with spectrogram restoration Basis extrapolation
  • 20. • The decomposition of NMF is equivalent to a maximum likelihood estimation, which assumes the generation model of the input data , implicitly. Analysis of extrapolation ability 20 Cost function in NMF: Exponential dist. Poisson dist. Gaussian dist. : Maximum of data IS-divergence KL-divergence EUC-distance
  • 21. • To compare net extrapolation ability, we generate a random data , which obey each generation model. • Also, we prepare the binary-masked random data , and attempt to restore that. Analysis of extrapolation ability 21 Restoration 100 bases is created. Training
  • 22. • Binary mask was randomly generated. – We generate two types of binary mask whose densities of holes are 75% and 98%. • SAR indicates the accuracy of restoration Analysis of extrapolation ability 22 Input random data Binary-masked data Restored data Binary masking Restoration [dB] Entry-wise square
  • 23. Results of restoration analysis • Simulated result of the restoration ability • The optimal divergence for the basis extrapolation (restoration) is around ! 23 25 20 15 10 5 0 SAR[dB] 43210 bNMF 25 20 15 10 5 0 SAR[dB] 43210 bNMF breg=0 breg=1 breg=2 breg=3 breg=0 breg=1 breg=2 breg=3 Optimal divergence for source separation (KL-divergence) Good Bad 75%-binary-masked 98%-binary-masked
  • 24. Trade-off between separation and restoration • The optimal divergence for SNMF with spectrogram restoration and its hybrid method is based on the trade-off between separation and restoration abilities. -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] -10 -8 -6 -4 -2 0 Amplitude[dB] 543210 Frequency [kHz] Sparseness: strong Sparseness: weak 24 Performance Separation Total performance of the hybrid method Restoration 0 1 2 3 4
  • 25. Outline • 1. Research background • 2. Conventional methods – Directional clustering – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 25
  • 26. • Mixed signal includes four melodies (sources). • Three compositions of instruments – We evaluated the average score of 36 patterns. Experimental condition 26 Center 1 2 3 4 Left Right Target source Supervision signal 24 notes that cover all the notes in the target melody Dataset Melody 1 Melody 2 Midrange Bass No. 1 Oboe Flute Piano Trombone No. 2 Trumpet Violin Harpsichord Fagotto No. 3 Horn Clarinet Piano Cello
  • 27. 14 12 10 8 6 4 2 0 SDR[dB] 43210 bNMF • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. Experimental result 27 Good Bad Conventional SNMF Proposed hybrid method ( ) Directional clustering Multichannel NMF [Sawada] KL-divergence EUC-distance Unsupervised method Supervised method Multichannel NMF is an integrated method.
  • 28. Experiment for real-recorded signal • We recorded a binaural signal using dummy head • Reverberation time: – 200 ms • The other conditions are the same as those in the previous instantaneous mixture signal. 28 1 Center Right 4 2 3 Left Dummy head 1.5 m 1.5 m 1.5 m 2.5 m Target signal
  • 29. 14 12 10 8 6 4 2 0 SDR[dB] 43210 bNMF • Result for real-recorded signals Experimental result 29 Good Bad Conventional SNMF Proposed hybrid method ( ) Unsupervised method Supervised method Directional clustering Multichannel NMF [Sawada] KL-divergence EUC-distance Multichannel NMF is an integrated method.
  • 30. Conclusions • Restoration requires anti-sparse criterion ( b = 3 ) • There is a trade-off between separation and restoration abilities • Optimal divergence is EUC-distance for SNMF with spectrogram restoration – whereas KL-divergence is the best for conventional SNMF. 30 Thank you for your attention!

Editor's Notes

  1. This is outline of my talk.
  2. First, // I talk about research background.
  3. Recently, // signal separation technologies have received much attention. These technologies are available for many applications, such as an automatic transcription, 3D audio system, and so on. / Music signal separation / based on nonnegative matrix factorization, // NMF in short, // has been a very active area of the research. Particularly, supervised NMF (SNMF) / achieves the highest separation performance. However, SNMF can be used for only single-channel signal. If we could use the multichannel information, we can improve the performance more than ever. To improve its performance, SNMF-based multichannel signal separation method is required.
  4. Our proposed hybrid method concatenates spatial separation method called directional clustering / and SNMF based separation method called SNMF with spectrogram restoration. In this hybrid method, first, the target direction is separated by the directional clustering. Then, target signal is separated by this SNMF.
  5. In previous studies, / we confirmed that / the divergence criterion in SNMF / strongly affects separation performance. For the source separation, KL-divergence criterion is often used and achieves the highest separation performance. However, the optimal divergence for our new SNMF with spectrogram restoration is not apparent. Therefore, in this presentation, we extend this method to a more generalized form. In addition, we will give a theoretical analysis for the optimization of the divergence / to achieve the highest separation performance.
  6. Next, // I talk about conventional methods.
  7. Directional clustering is an unsupervised spatial separation method. This method utilizes differences between left and right channels as a clustering cue. So, we can separate the sources direction-wisely. And this is equal to binary masking in the spectrogram domain. So, we can obtain the binary mask from the result of clustering, and we have an entry-wise product. Then we can obtain the separated signal. However, this method cannot separate the sources in the same direction / like this. In addition, the separated signal has an artificial distortion owing to the binary masking.
  8. Next method is NMF. NMF is a powerful method for extracting significant features from a spectrogram. NMF decomposes the input spectrogram Y into a product of basis matrix F and activation matrix G, where basis matrix F / has frequently-appearing spectral patterns / as basis vectors like this, and activation matrix G / has time-varying gains / of each basis vector.
  9. In NMF decomposition, the cost function is defined as a distance or a divergence between input matrix Y and decomposed matrix FG. This equation indicates the cost function in NMF, and we minimize this to find F and G. These 3 criteria / are often used in NMF decomposition, and KL-divergence is the best one for the acoustic signal separation.
  10. To separate the target signal using NMF, Supervised NMF has been proposed. SNMF is a supervised spectral separation method. In SNMF, first, we train the sample sound of the target signal, which is like a musical scale. Then we construct the supervised basis F. This is a spectral dictionary of the target sound. Next, we separate the mixed signal / using the supervised basis F, as FG+HU. Therefore, the target signal obtained as FG, and the other signal is reconstructed by HU. This method can separate the target signal well, but this method can be used for only the signal-channel signal.
  11. To apply the SNMF-based method to multichannel signal, / we have proposed a new SNMF called “SNMF with spectrogram restoration” / and its hybrid method. In this hybrid method, / first, / directional clustering is applied to the input stereo signal / to separate the target direction. Then, / the target signal is separated by this new SNMF.
  12. Here, / the separated spectrogram by directional clustering / has many spectral holes / like this. This is due to the binary masking in directional clustering. But, our new SNMF can restore such damaged spectrogram / by using a spectral dictionary of the target sound, namely, this SNMF can extrapolate the supervised basis F. Simultaneously, the non-target signal is separated.
  13. This is a decomposition model of SNMF with spectrogram restoration. And, this equation is the cost function. In this cost function, / the divergence is defined at all spectrogram grids / except for the spectral holes / by using the binary mask I. For the grids of the holes, we impose a regularization term to avoid the extrapolation error. In previous studies, we used EUC-distance and KL-divergence in this cost function. In this presentation, we introduce a generalized divergence to this and extend this method.
  14. Next, I talk about Analysis of restoration ability
  15. In the extension, we introduce a generalized divergence function called beta-divergence. This function has a parameter beta, and includes EUC-distance, KL-divergence, and IS-divergence when beta equals 2, 1, and 0 respectively.
  16. By using beta-divergence, we can extend the cost function to more generalized form. This cost function includes EUC-distance, KL-divergence and so on.
  17. From the minimization of the cost function, / we can obtain the update rules / for the optimization of variable matrices G, H, and U.
  18. This SNMF has two tasks, namely, Separation of the target signal / and basis extrapolation for the restoration of the spectrogram, where the optimal divergence for source separation has been investigated by many researchers. And it is clarified that the KL-divergence is suitable for source separation. But nobody investigates about the optimal divergence for the basis extrapolation. So, we analyze the optimal one / based on a generation model in NMF.
  19. The decomposition of NMF is equivalent to a maximum likelihood estimation, / which assumes the generation model of the input data Y, implicitly. If we select the parameter beta, / the assumption of generation model is fixed. In other words, the parameter beta defines the generation model of the input data.
  20. In this analysis, to compare the net extrapolation ability, we generated a random input data Y, which obey each generation model. Also, we prepared the binary-masked random data YI, and attempt to restore that. In a training process, we construct the supervised basis F using the random data Y. Then we attempt to restore the binary-masked data using the trained basis F.
  21. The binary mask I was generated by uniform manner, and we generated two types of binary masks / whose densities of holes are 75% and 98%. Therefore, by calculating the similarity between input data Y and restored data FG, / we can evaluate the extrapolation ability and the accuracy of restoration. So SAR indicates the accuracy of restoration.
  22. These are the results of analysis. The left one is the result for 75%-binary-masked data, and the right one is 98%-binary masked data. Beta equals 1 is the optimal divergence for source separation, which means KL-divergence. But, surprisingly, the optimal divergence for the restoration is that / beta equals around 3.
  23. Therefore, the optimal divergence for the hybrid method is around EUC-distance / because of the trade-off between separation and restoration abilities / like this figure. This is because, the sparse basis is not suitable for the extrapolation using only the observable data.
  24. Next, I talk about Experiments.
  25. This is an experimental condition. The mixed signal includes four melodies. Each sound source located like this figure. The target source is always located in the center direction / with other interfering source. And we prepared 3 compositions of instruments and evaluated the average score of 36 patterns. In addition, the supervision signal has 24 notes like this score, which cover all the notes in the target melody.
  26. This is a result of experiment. We showed the average SDR score, where SDR indicates the total quality of the separation. Directional clustering cannot separate the sources in the same direction, so the result was not good. Multichannel NMF is an integrated method proposed by Sawada. This method utilizes an integrated cost function, which includes spatial and spectral separations simultaneously. But this method is quite difficult optimization problem because many variables should be optimized by using only one cost function. So, this method strongly depends on the initial value, and the average score becomes bad. The conventional SNMF achieves the highest score when beta equals 1, KL-divergence. But, the optimal divergence of our hybrid method was 2 because of the trade-off between separation and restoration abilities.
  27. Also we conducted an experiment using real-recorded signals. In this experiment, the binaural mixed signal was recorded in the real environment. The other conditions are the same as those in the previous experiment.
  28. This is a result of the experiment using real-recorded signal. From this result, we can confirm that the optimal divergence for the hybrid method is EUC-distance.
  29. This is conclusions of my talk. Thank you for your attention.
  30. その他の実験条件はこのようになっています. NMFの距離規範βNMFを0から4まで変化させた時のすべての組み合わせの評価値を比較します. 正則化の距離規範においてはもっとも性能の高いβreg=1のみを示しております. 評価値にはSDRを用いております. SDRは分離度合と人工歪の少なさを含む総合的な分離精度です.
  31. Supervised method has an inherent problem. That is, we cannot get the perfect supervision sound of the target signal. Even if the supervision sounds are the same type of instrument as the target sound, / these sounds differ / according to various conditions. For example, individual styles of playing / and the timbre individuality for each instrument, and so on. When we want to separate this piano sound from mixed signal, / maybe we can only prepare the similar piano sound, but the timbre is slightly different. However the supervised NMF cannot separate because of the difference of spectra of the target sound.
  32. To solve this problem, we have proposed a new supervised method / that adapts the supervised bases to the target spectra / by a basis deformation. This is the decomposition model in this method. We introduce the deformable term, / which has both positive and negative values like this. Then we optimize the matrices D, G, H, and U. This figure indicates spectral difference between the real sound and artificial sound.
  33. This figure shows the directional distribution of the input stereo signal. The target source is in the center direction, and other interfering sources are distributed like this. After directional clustering, / left and right source components / leak in the center cluster, // and center sources lose some of their components. These lost components / correspond to the spectral chasms in the spectrogram domain. And after SNMF with spectrogram restoration, the target components are separated / and restored using supervised bases of the target sound trained in advance. In other words, / the resolution of the target spectrogram / is recovered with the superresolution / by the supervised basis extrapolation.
  34. As another means of addressing multichannel signal separation, Multichannel NMF also has been proposed by Ozerov and Sawada. This method is a natural extension of NMF, and uses spectral and spatial cues. But, this unified method is very difficult optimization problem mathematically / because many variables should be optimized by one cost function. So, this method strongly depends on the initial values.
  35. This SNMF is for a single-channel signal. Therefore we cannot use the information about correlation between channels. However, almost all music signals are the stereo format. So we should extend SNMF for a multichannel signal. In addition, when many interfering sources exist, the separation performance of SNMF markedly degrades. This is because Many spectral patterns arise / with similar to the target sound.
  36. Nonnegative matrix factorization is a very powerful and useful method / for extracting significant features in the input matrix. NMF decomposes the input nonnegative matrix Y / into two matrices F and G like this, // where F and G cannot have the negative entries. Therefore, all the entries in Y, F, and G / are nonnegative. In addition, K is usually set smaller value than Ω and T, / so this is a kind of low-rank approximation. This nonnegative constraint and dimensional reduction result that / the basis matrix has distinctive components in the observed matrix.
  37. The optimization of variables F and G in NMF / is based on the minimization of the cost function. The cost function is defined as the divergence between observed spectrogram Y / and reconstructed spectrogram FG. This minimization is an inequality constrained optimization problem.
  38. This is a result of the experiment using real-recorded signal. From this result, we can confirm that the optimal divergence for the hybrid method is EUC-distance.
  39. This spectrum is obtained by directional clustering. There are many spectral chasms owing to the binary masking. SNMF with spectrogram restoration / treats these chasms as an unseen observations like this, / and extrapolates the fittest target basis / from the supervised bases F. As a result, the lost components are restored by the supervised basis extrapolation.
  40. SDR is the total evaluation score as the performance of separation.