SlideShare a Scribd company logo
1 of 24
Download to read offline
H. Nakajima (UTokyo), D. Kitamura (SOKENDAI),
N. Takamune (UTokyo), S. Koyama (UTokyo), H. Saruwatari (UTokyo),
Y. Takahashi (Yamaha R&D), K. Kondo (Yamaha R&D)
Audio Signal Separation Using Supervised NMF with
Time-Variant All-Pole-Model-Based Basis Deformation
APSIPA2016 Organized Session on Advances in Acoustic Signal Processing
Nonnegative Matrix Factorization (NMF) [Lee, et al., 2001]
• Feature extraction based on low-rank representation
Amplitude
Amplitude
Observation
(spectrogram)
Basis matrix
(frequently appeared spectrum)
Activation matrix
(gain variation)
Time
𝑓 : frequency bin
𝑡 : time frame
k: # of bases
Time
Frequency
Frequency
𝑭 𝑮
𝑡
𝒀
𝑡
Extracted basis can be used for infromed source separation,
e.g., music demixing, speech enhancement, etc.
• Source separation using target-signal basis (supervision)
Supervised NMF (SNMF) [Smaragdis, et al., 2007]
Basis trained using
target-signal samples
Separation Estimate given supervised basis
Separated spectrogram
𝒀mix
Training
Objective of This Study
• Drawback of SNMF
→Accuracy decreases when variant trained basis is used.
We propose a new algorithm for deformation of
trained basis to make it fit to open data.
Training
Separation
SNMF with Additive Basis Deformation (SNMF-ABD)
[Kitamura, et al., 2013]
• Open-data adaptation by modifying supervised
basis 𝑭 with additive term 𝑫
Signal model:
Many orthogonal penalty parameters are needed but
uncontrollable.
Strong sensitivity to initial
value
𝒀mix ≈ 𝑭 + 𝑫 𝑮 + 𝑯𝑼
𝑭
𝑯 𝑫
SNMF with Time-Invariant Basis Deformation (TID)
[Nakajima, et al., EUSIPCO2016]
Training
Separation
Supervision
𝑭org
・Source separation and basis deformation are independently processed.
・Basis deformation is performed via target given by generalized MMSE-STSA estimator.
・Iterative basis deformation [Breithaupt, et al., 2008]
SNMF with Time-Invariant Basis Deformation (TID)
[Nakajima, et al., EUSIPCO2016]
Training
Separation
Generation of target
by generalized MMSE-STSA
estimator
Basis deformation
Supervision
𝑭org
Interference
𝒀mix − 𝑭𝑮
Estimated target 𝒀
Binary mask 𝑰
𝑭 ← 𝑨𝑭org
𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭org 𝑮)
Hereafter we propose an improved algorithm introducing time variance.
Diagonal matrix with all-pole-
model-based deformation
・Source separation and basis deformation are independently processed.
・Basis deformation is performed via target given by generalized MMSE-STSA estimator.
・Iterative basis deformation
To extract convincing 𝒀
[Breithaupt, et al., 2008]
Proposed Discriminative Time-Variant Deformation
① Supervised basis is classified to 2 parts, capturing time-variant nature.
② Exceeding deformation is avoided by discriminative training.
Training
Separation
Generation of target
by generalized MMSE-STSA
estimator
Basis deformation
Supervision
𝑭org
Interference
𝒀mix − 𝑭𝑮
Estimated target 𝒀
Binary mask 𝑰
𝑭 ← 𝑨𝑭org
𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭org 𝑮)
Proposed Discriminative Time-Variant Deformation
Supervision
𝑭org
= [𝑭atk, 𝑭sus]
𝑭 ← [𝑨𝑭atk, 𝑩𝑭sus]
① Supervised basis is classified to 2 parts, capturing time-variant nature.
② Exceeding deformation is avoided by discriminative training.
Training
Separation
Generation of target
by generalized MMSE-STSA
estimator
Interference
𝒀mix − 𝑭𝑮
Estimated target 𝒀
Binary mask 𝑰
𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭org 𝑮)
Discriminative basis
deformation considering
interference
①
②
Proposed ①: Time Variance in Instruments
Basis deformation model should be changed in accordance
with difference in physical mechanism of articulation.
Ex: Piano articulation
String
Hammer
• Physical mechanism is different in Attack and Sustain in music
instruments. [N. H. Fletcher, 1991]
Initial state
Flip string
(transitional)
Free vibration
Proposed ①: Basis Classification
• Bases is classified in accordance with frequency of attack and
sustain generation.
• In each basis group, we apply difference deformation model.
≈ 𝑭org 𝑮atk ≈ 𝑭org 𝑮sus
Classify 𝑭org into 𝑭1 and 𝑭2 based on k-means method
Frequency of attack part for each basis Frequency of sustain part for each basis
Truncate sustain part in
training sample
Truncate attack part in
training sample Time
Time Time
Proposed ①: Deformation Model
𝒀 : Estimated target by generalized MMSE-STSA estimator
𝑰 : Binary mask for sampling convincing components
𝑭 𝟏 : Supervised basis trained using attack part only
𝑭 𝟐 : Supervised basis trained using sustain part only
𝑨 : Diagonal matrix with all−pole−model spectrum to deform 𝑭 𝟏
𝑩 : Diagonal matrix with all−pole−model spectrum to deform 𝑭 𝟐
𝑮 𝟏, 𝑮 𝟐 : Activation matrices corresponding to 𝑭 𝟏, 𝑭 𝟐
: Hadamard product
𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭1 𝑮 𝟏 + 𝑩𝑭2 𝑮 𝟐)
Deformation
parameters
• We prepare different deformation models for attack and sustain.
Proposed ①: Parameter Update
Cost function
based on KL div.
Parameter update
by auxiliary-
function method
Proposed ②:Discriminative Basis Deformation
• Large degree of freedom in A, B often allows to represent interference,
resulting in deterioration of separation accuracy.
• Discriminative deformation can mitigate such side effects.
Formulation as Bilevel Optimization
→ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 is hard to represent interference component in 𝒀.
Owing to this cost, target and interference components are separately modeled.
Target component Interference
component
subject to
𝑮 𝟏, 𝑮 𝟐 = arg min
𝑮 𝟏,𝑮 𝟐,𝑯,𝑼
(𝑰 ∘ 𝒀mix|𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼))
𝑨, 𝑩 = arg min
𝑨,𝑩
(𝑰 ∘ 𝒀|𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐))
Fitness for
target Y only
Fitness for
mixture 𝒀mix
Unfortunately this problem is hard to be solved, so we propose
an approximated solver algorithm.
Proposed ②:Approximated Algorithm
• Step 1: Initialization (the same as conventional one)
min
𝑨,𝑮 𝟏,𝑩,𝑮 𝟐
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 )
• Step 2: Modeling of mixture Ymix
min
𝑮 𝟏,𝑮 𝟐,𝑯,𝑼
𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼))
• Step 3: Modeling of target Y
min
𝑨,𝑩
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐))
Fixing basis deformation matrix, we estimate activation.
Fixing activation matrix, we estimate deformation matrix.
We iteratively search set of deformation matrices that represent
target spectrogram in the vicinity of those that fit for mixture.
Proposed ②:Approximated Algorithm
• Step 1: Initialization (the same as conventional one)
min
𝑨,𝑮 𝟏,𝑩,𝑮 𝟐
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 )
• Step 2: Modeling of mixture Ymix
min
𝑮 𝟏,𝑮 𝟐,𝑯,𝑼
𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼))
• Step 3: Modeling of target Y
min
𝑨,𝑩
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐))
Fixing basis deformation matrix, we estimate activation.
Fixing activation matrix, we estimate deformation matrix.
We iteratively search set of deformation matrices that represent
target spectrogram in the vicinity of those that fit for mixture.
Proposed ②:Approximated Algorithm
• Step 1: Initialization (the same as conventional one)
min
𝑨,𝑮 𝟏,𝑩,𝑮 𝟐
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 )
• Step 2: Modeling of mixture Ymix
min
𝑮 𝟏,𝑮 𝟐,𝑯,𝑼
𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼))
• Step 3: Modeling of target Y
min
𝑨,𝑩
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐))
We iteratively search set of deformation matrices that represent
target spectrogram in the vicinity of those that fit for mixture.
Fixing basis deformation matrix, we estimate activation.
Fixing activation matrix, we estimate deformation matrix.
Proposed ②:Approximated Algorithm
• Step 1: Initialization (the same as conventional one)
min
𝑨,𝑮 𝟏,𝑩,𝑮 𝟐
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 )
• Step 2: Modeling of mixture Ymix
min
𝑮 𝟏,𝑮 𝟐,𝑯,𝑼
𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼))
• Step 3: Modeling of target Y
min
𝑨,𝑩
𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐))
Fixing basis deformation matrix, we estimate activation.
Fixing activation matrix, we estimate deformation matrix.
We iteratively search set of deformation matrices that represent
target spectrogram in the vicinity of those that fit for mixture.
Experimental Evaluation: Condition
Instruments Oboe (Ob.), Piano (Pf.), Trombone (Tb.)
Training (MIDI) Garritan Professional Orchestra
Open target (MIDI) Microsoft GS Wavetable SW Synth
Sampling freq. 44100 Hz
FFT length 4096 points (100 ms)
Shift length 512 points (15 ms)
# of bases Target: 100, Interference: 30
Truncation period for
extraction of attack
50 ms
Comparison
Conventional methods: SNMF, SNMF-ABD, TID
Proposed method
Evaluation score
Signal-to-Distortion Ratio (SDR) [dB]
(for evaluating total quality of separated signal)
• Different MIDI generators were used for training and open data.
• Source separation for 2-sound mixture using supervised basis.
Music Score Used in Experiment
・Open data (mixture)
・Training samples
Oboe
Piano
Trombone
Oboe
Piano
Trombone
• 2 octave
chromatic scale
• Test song for NMF
research
[Kitamura, 2014]
Results 1: Example
Ex. Piano-sound extraction from mixture of oboe and piano
Better SDR rather
than conventional
methods
Results 2: Overall Evaluation
SNMF
[dB]
SNMF-
ABD [dB]
TID
[dB]
Proposed
[dB]
Ob. & Pf. 6.7 8.1 6.7 7.0
Ob. & Tb. 2.4 2.6 2.8 2.9
Pf. & Ob. 4.1 3.6 5.2 6.1
Pf. & Tb. 3.1 3.2 4.5 4.5
Tb. & Ob. 0.7 0.2 2.4 2.8
Tb. & Pf. 2.9 2.6 3.9 4.4
“A & B” means task for extraction of “A” from mixture of A and B.
SNMF-ABD: Basis deformation NMF in parallel with separation
TID: Time-invariant deformation NMF without considering interference
Results 2: Overall Evaluation
SNMF
[dB]
SNMF-
ABD [dB]
TID
[dB]
Proposed
[dB]
Ob. & Pf. 6.7 8.1 6.7 7.0
Ob. & Tb. 2.4 2.6 2.8 2.9
Pf. & Ob. 4.1 3.6 5.2 6.1
Pf. & Tb. 3.1 3.2 4.5 4.5
Tb. & Ob. 0.7 0.2 2.4 2.8
Tb. & Pf. 2.9 2.6 3.9 4.4
Proposed method outperforms
SNMF and TID in all combination.
In only one case, SNMF-ABD wins
but loses in the other cases.
Conclusion
• In this study, we propose a new advanced SNMF that
includes time-variant (attack & sustain) deformation of
the trained basis to make it fit the target sound.
• Also, to avoid the exceeding deformation, we propose
a discriminative basis deformation. In order to solve
the bilevel optimization problem, we introduce an
approximated algorithm.
• From the experimental results, it was confirmed that
the proposed method outperforms the conventional
methods in many cases.
Thank you for your attention!

More Related Content

What's hot

Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
奈良先端大 情報科学研究科
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
奈良先端大 情報科学研究科
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
奈良先端大 情報科学研究科
 

What's hot (20)

Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure models
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
 
Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...Linear multichannel blind source separation based on time-frequency mask obta...
Linear multichannel blind source separation based on time-frequency mask obta...
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
 
DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...DNN-based frequency component prediction for frequency-domain audio source se...
DNN-based frequency component prediction for frequency-domain audio source se...
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
Learning the Statistical Model of the NMF Using the Deep Multiplicative Updat...
 
Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
 

Viewers also liked

Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
Shinnosuke Takamichi
 

Viewers also liked (10)

HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
 
ILRMA 20170227 danwakai
ILRMA 20170227 danwakaiILRMA 20170227 danwakai
ILRMA 20170227 danwakai
 
Slp201702
Slp201702Slp201702
Slp201702
 
Ea2015 7for ss
Ea2015 7for ssEa2015 7for ss
Ea2015 7for ss
 
Asj2017 3 bileveloptnmf
Asj2017 3 bileveloptnmfAsj2017 3 bileveloptnmf
Asj2017 3 bileveloptnmf
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 
Asj2017 3invited
Asj2017 3invitedAsj2017 3invited
Asj2017 3invited
 
Discriminative SNMF EA201603
Discriminative SNMF EA201603Discriminative SNMF EA201603
Discriminative SNMF EA201603
 
数値解析と物理学
数値解析と物理学数値解析と物理学
数値解析と物理学
 
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
 

Similar to Apsipa2016for ss

Grds international conference on pure and applied science (5)
Grds international conference on pure and applied science (5)Grds international conference on pure and applied science (5)
Grds international conference on pure and applied science (5)
Global R & D Services
 

Similar to Apsipa2016for ss (20)

Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Icmmse slides
Icmmse slidesIcmmse slides
Icmmse slides
 
MSSISS riBART 20160321
MSSISS riBART 20160321MSSISS riBART 20160321
MSSISS riBART 20160321
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015Visual Impression Localization of Autonomous Robots_#CASE2015
Visual Impression Localization of Autonomous Robots_#CASE2015
 
A deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhouA deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhou
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMC
 
Grds international conference on pure and applied science (5)
Grds international conference on pure and applied science (5)Grds international conference on pure and applied science (5)
Grds international conference on pure and applied science (5)
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
[02] Quantum Error Correction for Beginners
[02] Quantum  Error Correction for Beginners[02] Quantum  Error Correction for Beginners
[02] Quantum Error Correction for Beginners
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...Software Development for Space-group Analysis: Magnetic Space Group and Irred...
Software Development for Space-group Analysis: Magnetic Space Group and Irred...
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
 
Ds33717725
Ds33717725Ds33717725
Ds33717725
 
Ds33717725
Ds33717725Ds33717725
Ds33717725
 
12 l1-harmonic methodology
12 l1-harmonic methodology12 l1-harmonic methodology
12 l1-harmonic methodology
 
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

Apsipa2016for ss

  • 1. H. Nakajima (UTokyo), D. Kitamura (SOKENDAI), N. Takamune (UTokyo), S. Koyama (UTokyo), H. Saruwatari (UTokyo), Y. Takahashi (Yamaha R&D), K. Kondo (Yamaha R&D) Audio Signal Separation Using Supervised NMF with Time-Variant All-Pole-Model-Based Basis Deformation APSIPA2016 Organized Session on Advances in Acoustic Signal Processing
  • 2. Nonnegative Matrix Factorization (NMF) [Lee, et al., 2001] • Feature extraction based on low-rank representation Amplitude Amplitude Observation (spectrogram) Basis matrix (frequently appeared spectrum) Activation matrix (gain variation) Time 𝑓 : frequency bin 𝑡 : time frame k: # of bases Time Frequency Frequency 𝑭 𝑮 𝑡 𝒀 𝑡 Extracted basis can be used for infromed source separation, e.g., music demixing, speech enhancement, etc.
  • 3. • Source separation using target-signal basis (supervision) Supervised NMF (SNMF) [Smaragdis, et al., 2007] Basis trained using target-signal samples Separation Estimate given supervised basis Separated spectrogram 𝒀mix Training
  • 4. Objective of This Study • Drawback of SNMF →Accuracy decreases when variant trained basis is used. We propose a new algorithm for deformation of trained basis to make it fit to open data. Training Separation
  • 5. SNMF with Additive Basis Deformation (SNMF-ABD) [Kitamura, et al., 2013] • Open-data adaptation by modifying supervised basis 𝑭 with additive term 𝑫 Signal model: Many orthogonal penalty parameters are needed but uncontrollable. Strong sensitivity to initial value 𝒀mix ≈ 𝑭 + 𝑫 𝑮 + 𝑯𝑼 𝑭 𝑯 𝑫
  • 6. SNMF with Time-Invariant Basis Deformation (TID) [Nakajima, et al., EUSIPCO2016] Training Separation Supervision 𝑭org ・Source separation and basis deformation are independently processed. ・Basis deformation is performed via target given by generalized MMSE-STSA estimator. ・Iterative basis deformation [Breithaupt, et al., 2008]
  • 7. SNMF with Time-Invariant Basis Deformation (TID) [Nakajima, et al., EUSIPCO2016] Training Separation Generation of target by generalized MMSE-STSA estimator Basis deformation Supervision 𝑭org Interference 𝒀mix − 𝑭𝑮 Estimated target 𝒀 Binary mask 𝑰 𝑭 ← 𝑨𝑭org 𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭org 𝑮) Hereafter we propose an improved algorithm introducing time variance. Diagonal matrix with all-pole- model-based deformation ・Source separation and basis deformation are independently processed. ・Basis deformation is performed via target given by generalized MMSE-STSA estimator. ・Iterative basis deformation To extract convincing 𝒀 [Breithaupt, et al., 2008]
  • 8. Proposed Discriminative Time-Variant Deformation ① Supervised basis is classified to 2 parts, capturing time-variant nature. ② Exceeding deformation is avoided by discriminative training. Training Separation Generation of target by generalized MMSE-STSA estimator Basis deformation Supervision 𝑭org Interference 𝒀mix − 𝑭𝑮 Estimated target 𝒀 Binary mask 𝑰 𝑭 ← 𝑨𝑭org 𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭org 𝑮)
  • 9. Proposed Discriminative Time-Variant Deformation Supervision 𝑭org = [𝑭atk, 𝑭sus] 𝑭 ← [𝑨𝑭atk, 𝑩𝑭sus] ① Supervised basis is classified to 2 parts, capturing time-variant nature. ② Exceeding deformation is avoided by discriminative training. Training Separation Generation of target by generalized MMSE-STSA estimator Interference 𝒀mix − 𝑭𝑮 Estimated target 𝒀 Binary mask 𝑰 𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭org 𝑮) Discriminative basis deformation considering interference ① ②
  • 10. Proposed ①: Time Variance in Instruments Basis deformation model should be changed in accordance with difference in physical mechanism of articulation. Ex: Piano articulation String Hammer • Physical mechanism is different in Attack and Sustain in music instruments. [N. H. Fletcher, 1991] Initial state Flip string (transitional) Free vibration
  • 11. Proposed ①: Basis Classification • Bases is classified in accordance with frequency of attack and sustain generation. • In each basis group, we apply difference deformation model. ≈ 𝑭org 𝑮atk ≈ 𝑭org 𝑮sus Classify 𝑭org into 𝑭1 and 𝑭2 based on k-means method Frequency of attack part for each basis Frequency of sustain part for each basis Truncate sustain part in training sample Truncate attack part in training sample Time Time Time
  • 12. Proposed ①: Deformation Model 𝒀 : Estimated target by generalized MMSE-STSA estimator 𝑰 : Binary mask for sampling convincing components 𝑭 𝟏 : Supervised basis trained using attack part only 𝑭 𝟐 : Supervised basis trained using sustain part only 𝑨 : Diagonal matrix with all−pole−model spectrum to deform 𝑭 𝟏 𝑩 : Diagonal matrix with all−pole−model spectrum to deform 𝑭 𝟐 𝑮 𝟏, 𝑮 𝟐 : Activation matrices corresponding to 𝑭 𝟏, 𝑭 𝟐 : Hadamard product 𝑰 ○ 𝒀 ≈ 𝑰 ○ (𝑨𝑭1 𝑮 𝟏 + 𝑩𝑭2 𝑮 𝟐) Deformation parameters • We prepare different deformation models for attack and sustain.
  • 13. Proposed ①: Parameter Update Cost function based on KL div. Parameter update by auxiliary- function method
  • 14. Proposed ②:Discriminative Basis Deformation • Large degree of freedom in A, B often allows to represent interference, resulting in deterioration of separation accuracy. • Discriminative deformation can mitigate such side effects. Formulation as Bilevel Optimization → 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 is hard to represent interference component in 𝒀. Owing to this cost, target and interference components are separately modeled. Target component Interference component subject to 𝑮 𝟏, 𝑮 𝟐 = arg min 𝑮 𝟏,𝑮 𝟐,𝑯,𝑼 (𝑰 ∘ 𝒀mix|𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼)) 𝑨, 𝑩 = arg min 𝑨,𝑩 (𝑰 ∘ 𝒀|𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐)) Fitness for target Y only Fitness for mixture 𝒀mix Unfortunately this problem is hard to be solved, so we propose an approximated solver algorithm.
  • 15. Proposed ②:Approximated Algorithm • Step 1: Initialization (the same as conventional one) min 𝑨,𝑮 𝟏,𝑩,𝑮 𝟐 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 ) • Step 2: Modeling of mixture Ymix min 𝑮 𝟏,𝑮 𝟐,𝑯,𝑼 𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼)) • Step 3: Modeling of target Y min 𝑨,𝑩 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐)) Fixing basis deformation matrix, we estimate activation. Fixing activation matrix, we estimate deformation matrix. We iteratively search set of deformation matrices that represent target spectrogram in the vicinity of those that fit for mixture.
  • 16. Proposed ②:Approximated Algorithm • Step 1: Initialization (the same as conventional one) min 𝑨,𝑮 𝟏,𝑩,𝑮 𝟐 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 ) • Step 2: Modeling of mixture Ymix min 𝑮 𝟏,𝑮 𝟐,𝑯,𝑼 𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼)) • Step 3: Modeling of target Y min 𝑨,𝑩 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐)) Fixing basis deformation matrix, we estimate activation. Fixing activation matrix, we estimate deformation matrix. We iteratively search set of deformation matrices that represent target spectrogram in the vicinity of those that fit for mixture.
  • 17. Proposed ②:Approximated Algorithm • Step 1: Initialization (the same as conventional one) min 𝑨,𝑮 𝟏,𝑩,𝑮 𝟐 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 ) • Step 2: Modeling of mixture Ymix min 𝑮 𝟏,𝑮 𝟐,𝑯,𝑼 𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼)) • Step 3: Modeling of target Y min 𝑨,𝑩 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐)) We iteratively search set of deformation matrices that represent target spectrogram in the vicinity of those that fit for mixture. Fixing basis deformation matrix, we estimate activation. Fixing activation matrix, we estimate deformation matrix.
  • 18. Proposed ②:Approximated Algorithm • Step 1: Initialization (the same as conventional one) min 𝑨,𝑮 𝟏,𝑩,𝑮 𝟐 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ 𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 ) • Step 2: Modeling of mixture Ymix min 𝑮 𝟏,𝑮 𝟐,𝑯,𝑼 𝐷(𝑰 ∘ 𝒀mix||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐 + 𝑯𝑼)) • Step 3: Modeling of target Y min 𝑨,𝑩 𝐷(𝑰 ∘ 𝒀 ||𝑰 ∘ (𝑨𝑭 𝟏 𝑮 𝟏 + 𝑩𝑭 𝟐 𝑮 𝟐)) Fixing basis deformation matrix, we estimate activation. Fixing activation matrix, we estimate deformation matrix. We iteratively search set of deformation matrices that represent target spectrogram in the vicinity of those that fit for mixture.
  • 19. Experimental Evaluation: Condition Instruments Oboe (Ob.), Piano (Pf.), Trombone (Tb.) Training (MIDI) Garritan Professional Orchestra Open target (MIDI) Microsoft GS Wavetable SW Synth Sampling freq. 44100 Hz FFT length 4096 points (100 ms) Shift length 512 points (15 ms) # of bases Target: 100, Interference: 30 Truncation period for extraction of attack 50 ms Comparison Conventional methods: SNMF, SNMF-ABD, TID Proposed method Evaluation score Signal-to-Distortion Ratio (SDR) [dB] (for evaluating total quality of separated signal) • Different MIDI generators were used for training and open data. • Source separation for 2-sound mixture using supervised basis.
  • 20. Music Score Used in Experiment ・Open data (mixture) ・Training samples Oboe Piano Trombone Oboe Piano Trombone • 2 octave chromatic scale • Test song for NMF research [Kitamura, 2014]
  • 21. Results 1: Example Ex. Piano-sound extraction from mixture of oboe and piano Better SDR rather than conventional methods
  • 22. Results 2: Overall Evaluation SNMF [dB] SNMF- ABD [dB] TID [dB] Proposed [dB] Ob. & Pf. 6.7 8.1 6.7 7.0 Ob. & Tb. 2.4 2.6 2.8 2.9 Pf. & Ob. 4.1 3.6 5.2 6.1 Pf. & Tb. 3.1 3.2 4.5 4.5 Tb. & Ob. 0.7 0.2 2.4 2.8 Tb. & Pf. 2.9 2.6 3.9 4.4 “A & B” means task for extraction of “A” from mixture of A and B. SNMF-ABD: Basis deformation NMF in parallel with separation TID: Time-invariant deformation NMF without considering interference
  • 23. Results 2: Overall Evaluation SNMF [dB] SNMF- ABD [dB] TID [dB] Proposed [dB] Ob. & Pf. 6.7 8.1 6.7 7.0 Ob. & Tb. 2.4 2.6 2.8 2.9 Pf. & Ob. 4.1 3.6 5.2 6.1 Pf. & Tb. 3.1 3.2 4.5 4.5 Tb. & Ob. 0.7 0.2 2.4 2.8 Tb. & Pf. 2.9 2.6 3.9 4.4 Proposed method outperforms SNMF and TID in all combination. In only one case, SNMF-ABD wins but loses in the other cases.
  • 24. Conclusion • In this study, we propose a new advanced SNMF that includes time-variant (attack & sustain) deformation of the trained basis to make it fit the target sound. • Also, to avoid the exceeding deformation, we propose a discriminative basis deformation. In order to solve the bilevel optimization problem, we introduce an approximated algorithm. • From the experimental results, it was confirmed that the proposed method outperforms the conventional methods in many cases. Thank you for your attention!