SlideShare a Scribd company logo
1 of 21
Linear multichannel blind source separation
based on time-frequency mask obtained by
harmonic/percussive sound separation
Oyabu Soichiro1, Daichi Kitamura1, and Kohei Yatabe2
1National Institute of Technology, Kagawa College, Japan
2The University of Waseda, Japan
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Background
• Blind source separation (BSS)
– aims at separating audio sources such as speech, noise,
musical instruments, and so on
– can be used for many audio applications
– extracts individual sources without any prior information or
training (unsupervised technique)
2
Mixture signal Separated drum source
• Multichannel BSS
– estimates demixing system (inverse system of )
– High separation quality because of spatial information
• Independent vector analysis (IVA) [Kim+, 2007]
• Independent low-rank matrix (ILRMA) [Kitamura+, 2016]
• Time-frequency-masking-based BSS (TFMBSS) [Yatabe+, 2019]
• Single-channel BSS
– Monaural audio source separation problem
– More difficult than multichannel BSS
• Harmonic/percussive sound separation (HPSS) [Ono+, 2008]
Background
3
Mixing system Demixing system
Background
• HPSS [Ono+, 2008], [FitzGerald, 2010], [Duong+, 2011], [Tachibana+, 2012], etc.
– separates harmonic and percussive sound sources in a
fully blind manner
– can be used for music analysis and remixing
• Ex. estimation of chords, tempo, rhythm, notes, and genre
4
Mixture signal
Harmonic estimate
Percussive estimate
Aim: high-quality multichannel blind HPSS
Source Models in BSS
• In BSS, assumption of time-frequency structure in
each source (source model) is required
– IVA [Kim+, 2007]
• All the frequencies of each source
simultaneously have large power
• Linear spatial demixing
– ILRMA [Kitamura+, 2016]
• Power spectrogram of each source
have a low-rank time-frequency
structure
• Linear spatial demixing
– HPSS [Ono+, 2008], [Tachibana+, 2012]
• Harmonic: time-continuous structure
• Percussive: freq.-continuous structure
• Non-linear separation
5
Freq.
Freq.
Freq.
Time
Time
Time
Percussive src.
Harmonic src.
Source2
Source1
Source2
Source1
Conventional Method: TFMBSS [Yatabe+, 2019]
• Time-frequency-masking-based BSS (TFMBSS)
– Linear multichannel BSS with plug-and-play source models
– Source model is input as a time-frequency mask
6
Generate time-frequency
mask based on temporal
estimated sources
Masking process
: entrywise product
• Harmonic/percussive sound separation (HPSS)
– Separate sources by focusing on “smoothness” along with
time or frequency directions in spectrogram
– Estimate and by iteratively minimizing the cost function
Conventional Method: HPSS [Ono+, 2008]
7
Harmonic estimate
Mixture signal
Harmonic
components
Percussive
components
Time
Frequency
Percussive estimate
Proposed Algorithm: Process Flow
8
Inverse
STFT
STFT
Back
projection
Smoothing
Smoothing
HPSS
HPSS
Old masks
Initialize and with
Initialize and with
Back projection
Back projection
Observed signal Percussive estimate
Harmonic estimate
Masks
TFMBSS
Smoothed
masks
Ch 1
Ch 2
Temporal
estimates
• Two HPSS are independently
applied to each of temporarily
estimated signals and
• Two Wiener-like masks and , are
constructed using the results of HPSS
– These masks enhance the harmonic or percussive
components by eliminating the other components
Proposed Algorithm: Mask Calculation
9
HPSS
HPSS
Proposed Algorithm: Mask Smoothing
10
• In TFMBSS, drastic change of masks in each
iteration will cause instability of parameter
optimization
• Introduce mask smoothing process based on
weighted geometric mean
– Intensity of smoothing can be controlled by and
Mask calculated in
the previous iteration
Mask calculated in
the current iteration
Entrywise
product
Proposed Algorithm: Process Flow
11
Inverse
STFT
STFT
Back
projection
Smoothing
Smoothing
HPSS
HPSS
Old masks
Initialize and with
Initialize and with
Back projection
Back projection
Observed signal Percussive estimate
Harmonic estimate
Masks
TFMBSS
Smoothed
masks
Ch 1
Ch 2
Temporal
estimates
• Mixing condition of dry sources
Experiments: Conditions
12
Music Dataset
(dry sources)
SiSEC2016 MUS [Liutkus+, 2016]
“Drums” and “Other” sources of 20 songs
Windowing in STFT 128-ms-long Hann window with half-overlap shifting
Number of iterations
in TFMBSS
500
Subjective evaluation
score
Improvement of source-to-distortion ratio
(SDR) [Vincent+, 2006]
2 m
5.66cm
50 50
Impulse response E2A in RWCP database [Nakamura+, 2000]
(reverberation time: 300 ms)
Other source
(harmonic)
Drum source
(Percussive)
Experiment 1: Conditions
• Investigate the optimal number of iterations in HPSS
blocks
• Compare average SDR imp. of the proposed method
13
• Average SDR improvements of the proposed method
with various numbers of iterations in HPSS blocks
0
2
4
6
8
10
12
1 3 5 7 9 11 13 15 20
Average
SDR
improvements
[dB]
Number of iterations in HPSS blocks
Experiment 1: Results
14
Experiment 2: Conditions
• Investigate the optimal smoothing parameter
• Compare average SDR imp. of the proposed method
15
• Typical example of SDR behaviors in the proposed
method with various smoothing parameters
-2
0
2
4
6
8
10
12
14
16
0 100 200 300 400 500
SDR
improvements
[dB]
Numbers of iterations of proposed method
βold=0
βold=0.5
βold=0.75
βold=0.875
βold=0.9375
Experiment 2: Results
16
0
2
4
6
8
10
12
0 0.5 0.75 0.875 0.9375
Average
SDR
improvements
[dB]
Smoothing parameter
Experiment 2: Results
17
• Average SDR improvements of the proposed method
with various smoothing parameters
Experiment 3: Conditions
• Compare proposed method with five BSS methods
– Single-channel HPSS [Ono+, 2008] , [Tachibana+, 2010], [Tachibana+, 2012]
– Multichannel HPSS [Duong+, 2011]
– AuxIVA [Ono+, 2011]
– ILRMA [Kitamura+, 2016]
– Proposed method
18
• Average SDR improvement for HPSS and BSS
algorithms
Experiment 3: Results
0
2
4
6
8
10
12
Single-channel
HPSS
Multichannel
HPSS
AuxIVA ILRMA Proposed
method
Average
SDR
improvements
[dB]
Linear multichannel BSS
Non-linear
single-channel
BSS
19
• Music sample: SiSEC2016 Forkupines - Semantics
Demonstration
20
Observed
mixture
Single-
channel
HPSS
AuxIVA ILRMA
Proposed
method
Estimated harmonic sources
Estimated percussive sources
Conclusion
• Novelty
– Integration of conventional HPSS with TFMBSS
– Wiener-filter-based iterative mask update
– Iterative mask smoothing for stabilizing optimization
• Results
– Mask smoothing process drastically stabilizes the TFMBSS
optimization and improves the separation performance
– Achieved high-quality linear multichannel HPSS
21
Thank you for your attention!

More Related Content

What's hot

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...奈良先端大 情報科学研究科
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Daichi Kitamura
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Daichi Kitamura
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...奈良先端大 情報科学研究科
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用Kitamura Laboratory
 
Formats for coherent optical communications -OPTICAL COMMUNICATIONS
Formats for coherent optical communications -OPTICAL COMMUNICATIONSFormats for coherent optical communications -OPTICAL COMMUNICATIONS
Formats for coherent optical communications -OPTICAL COMMUNICATIONSNITHIN KALLE PALLY
 

What's hot (20)

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
 
Ica2016 312 saruwatari
Ica2016 312 saruwatariIca2016 312 saruwatari
Ica2016 312 saruwatari
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
 
Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Apsipa2016for ss
Apsipa2016for ssApsipa2016for ss
Apsipa2016for ss
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
 
Dsp2015for ss
Dsp2015for ssDsp2015for ss
Dsp2015for ss
 
Oceans13 Presentation
Oceans13 PresentationOceans13 Presentation
Oceans13 Presentation
 
COGNITIVE RADIO
COGNITIVE RADIOCOGNITIVE RADIO
COGNITIVE RADIO
 
Formats for coherent optical communications -OPTICAL COMMUNICATIONS
Formats for coherent optical communications -OPTICAL COMMUNICATIONSFormats for coherent optical communications -OPTICAL COMMUNICATIONS
Formats for coherent optical communications -OPTICAL COMMUNICATIONS
 

Similar to Linear multichannel blind source separation using harmonic/percussive separation time-frequency masks

Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Kitamura Laboratory
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
NMR Automation
NMR AutomationNMR Automation
NMR Automationcknoxrun
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesisNAVER Engineering
 
E media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationE media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationGiacomo Vairetti
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberatora3labdsp
 
A Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response SimulationA Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response Simulationa3labdsp
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approacha3labdsp
 
Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...
Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...
Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...CSCJournals
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to MusicEric Battenberg
 
OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5Asif Iqbal
 
FMRI medical imagining
FMRI  medical imaginingFMRI  medical imagining
FMRI medical imaginingVishwas N
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]威華 王
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 

Similar to Linear multichannel blind source separation using harmonic/percussive separation time-frequency masks (20)

Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
NMR Automation
NMR AutomationNMR Automation
NMR Automation
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
 
E media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationE media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberation
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberator
 
A Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response SimulationA Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response Simulation
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approach
 
Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...
Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...
Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to Music
 
example based audio editing
example based audio editingexample based audio editing
example based audio editing
 
OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5OPTICAL COMMUNICATION Unit 5
OPTICAL COMMUNICATION Unit 5
 
A1mpeg12 2004
A1mpeg12 2004A1mpeg12 2004
A1mpeg12 2004
 
FMRI medical imagining
FMRI  medical imaginingFMRI  medical imagining
FMRI medical imagining
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
dab_1.pdf
dab_1.pdfdab_1.pdf
dab_1.pdf
 
Asee05
Asee05Asee05
Asee05
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

More from Kitamura Laboratory

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定Kitamura Laboratory
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定Kitamura Laboratory
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムKitamura Laboratory
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離Kitamura Laboratory
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法Kitamura Laboratory
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Kitamura Laboratory
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価Kitamura Laboratory
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討Kitamura Laboratory
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,Kitamura Laboratory
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討Kitamura Laboratory
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測Kitamura Laboratory
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析Kitamura Laboratory
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離Kitamura Laboratory
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離Kitamura Laboratory
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧Kitamura Laboratory
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測Kitamura Laboratory
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化Kitamura Laboratory
 
独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システムKitamura Laboratory
 
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価Kitamura Laboratory
 
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用Kitamura Laboratory
 

More from Kitamura Laboratory (20)

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズム
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
 
独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム
 
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
 
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
基底共有型非負値行列因子分解に基づく楽器音の共通・固有成分の分析と音色変換への応用
 

Recently uploaded

Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 

Recently uploaded (20)

Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 

Linear multichannel blind source separation using harmonic/percussive separation time-frequency masks

  • 1. Linear multichannel blind source separation based on time-frequency mask obtained by harmonic/percussive sound separation Oyabu Soichiro1, Daichi Kitamura1, and Kohei Yatabe2 1National Institute of Technology, Kagawa College, Japan 2The University of Waseda, Japan 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2. Background • Blind source separation (BSS) – aims at separating audio sources such as speech, noise, musical instruments, and so on – can be used for many audio applications – extracts individual sources without any prior information or training (unsupervised technique) 2 Mixture signal Separated drum source
  • 3. • Multichannel BSS – estimates demixing system (inverse system of ) – High separation quality because of spatial information • Independent vector analysis (IVA) [Kim+, 2007] • Independent low-rank matrix (ILRMA) [Kitamura+, 2016] • Time-frequency-masking-based BSS (TFMBSS) [Yatabe+, 2019] • Single-channel BSS – Monaural audio source separation problem – More difficult than multichannel BSS • Harmonic/percussive sound separation (HPSS) [Ono+, 2008] Background 3 Mixing system Demixing system
  • 4. Background • HPSS [Ono+, 2008], [FitzGerald, 2010], [Duong+, 2011], [Tachibana+, 2012], etc. – separates harmonic and percussive sound sources in a fully blind manner – can be used for music analysis and remixing • Ex. estimation of chords, tempo, rhythm, notes, and genre 4 Mixture signal Harmonic estimate Percussive estimate Aim: high-quality multichannel blind HPSS
  • 5. Source Models in BSS • In BSS, assumption of time-frequency structure in each source (source model) is required – IVA [Kim+, 2007] • All the frequencies of each source simultaneously have large power • Linear spatial demixing – ILRMA [Kitamura+, 2016] • Power spectrogram of each source have a low-rank time-frequency structure • Linear spatial demixing – HPSS [Ono+, 2008], [Tachibana+, 2012] • Harmonic: time-continuous structure • Percussive: freq.-continuous structure • Non-linear separation 5 Freq. Freq. Freq. Time Time Time Percussive src. Harmonic src. Source2 Source1 Source2 Source1
  • 6. Conventional Method: TFMBSS [Yatabe+, 2019] • Time-frequency-masking-based BSS (TFMBSS) – Linear multichannel BSS with plug-and-play source models – Source model is input as a time-frequency mask 6 Generate time-frequency mask based on temporal estimated sources Masking process : entrywise product
  • 7. • Harmonic/percussive sound separation (HPSS) – Separate sources by focusing on “smoothness” along with time or frequency directions in spectrogram – Estimate and by iteratively minimizing the cost function Conventional Method: HPSS [Ono+, 2008] 7 Harmonic estimate Mixture signal Harmonic components Percussive components Time Frequency Percussive estimate
  • 8. Proposed Algorithm: Process Flow 8 Inverse STFT STFT Back projection Smoothing Smoothing HPSS HPSS Old masks Initialize and with Initialize and with Back projection Back projection Observed signal Percussive estimate Harmonic estimate Masks TFMBSS Smoothed masks Ch 1 Ch 2 Temporal estimates
  • 9. • Two HPSS are independently applied to each of temporarily estimated signals and • Two Wiener-like masks and , are constructed using the results of HPSS – These masks enhance the harmonic or percussive components by eliminating the other components Proposed Algorithm: Mask Calculation 9 HPSS HPSS
  • 10. Proposed Algorithm: Mask Smoothing 10 • In TFMBSS, drastic change of masks in each iteration will cause instability of parameter optimization • Introduce mask smoothing process based on weighted geometric mean – Intensity of smoothing can be controlled by and Mask calculated in the previous iteration Mask calculated in the current iteration Entrywise product
  • 11. Proposed Algorithm: Process Flow 11 Inverse STFT STFT Back projection Smoothing Smoothing HPSS HPSS Old masks Initialize and with Initialize and with Back projection Back projection Observed signal Percussive estimate Harmonic estimate Masks TFMBSS Smoothed masks Ch 1 Ch 2 Temporal estimates
  • 12. • Mixing condition of dry sources Experiments: Conditions 12 Music Dataset (dry sources) SiSEC2016 MUS [Liutkus+, 2016] “Drums” and “Other” sources of 20 songs Windowing in STFT 128-ms-long Hann window with half-overlap shifting Number of iterations in TFMBSS 500 Subjective evaluation score Improvement of source-to-distortion ratio (SDR) [Vincent+, 2006] 2 m 5.66cm 50 50 Impulse response E2A in RWCP database [Nakamura+, 2000] (reverberation time: 300 ms) Other source (harmonic) Drum source (Percussive)
  • 13. Experiment 1: Conditions • Investigate the optimal number of iterations in HPSS blocks • Compare average SDR imp. of the proposed method 13
  • 14. • Average SDR improvements of the proposed method with various numbers of iterations in HPSS blocks 0 2 4 6 8 10 12 1 3 5 7 9 11 13 15 20 Average SDR improvements [dB] Number of iterations in HPSS blocks Experiment 1: Results 14
  • 15. Experiment 2: Conditions • Investigate the optimal smoothing parameter • Compare average SDR imp. of the proposed method 15
  • 16. • Typical example of SDR behaviors in the proposed method with various smoothing parameters -2 0 2 4 6 8 10 12 14 16 0 100 200 300 400 500 SDR improvements [dB] Numbers of iterations of proposed method βold=0 βold=0.5 βold=0.75 βold=0.875 βold=0.9375 Experiment 2: Results 16
  • 17. 0 2 4 6 8 10 12 0 0.5 0.75 0.875 0.9375 Average SDR improvements [dB] Smoothing parameter Experiment 2: Results 17 • Average SDR improvements of the proposed method with various smoothing parameters
  • 18. Experiment 3: Conditions • Compare proposed method with five BSS methods – Single-channel HPSS [Ono+, 2008] , [Tachibana+, 2010], [Tachibana+, 2012] – Multichannel HPSS [Duong+, 2011] – AuxIVA [Ono+, 2011] – ILRMA [Kitamura+, 2016] – Proposed method 18
  • 19. • Average SDR improvement for HPSS and BSS algorithms Experiment 3: Results 0 2 4 6 8 10 12 Single-channel HPSS Multichannel HPSS AuxIVA ILRMA Proposed method Average SDR improvements [dB] Linear multichannel BSS Non-linear single-channel BSS 19
  • 20. • Music sample: SiSEC2016 Forkupines - Semantics Demonstration 20 Observed mixture Single- channel HPSS AuxIVA ILRMA Proposed method Estimated harmonic sources Estimated percussive sources
  • 21. Conclusion • Novelty – Integration of conventional HPSS with TFMBSS – Wiener-filter-based iterative mask update – Iterative mask smoothing for stabilizing optimization • Results – Mask smoothing process drastically stabilizes the TFMBSS optimization and improves the separation performance – Achieved high-quality linear multichannel HPSS 21 Thank you for your attention!

Editor's Notes

  1. 1
  2. Blind source separation, / BSS in short, / aims at separating audio sources / such as speech, noise, musical instruments, and so on. This technique(テクニーク)can be used for many audio applications. BSS is an unsupervised method / that extracts individual sources without any prior(プライオァ)information or training (トゥレイニン).
  3. When we use multiple microphones in a recording, / the BSS problem is called “multichannel BSS.” This method estimates the separation system W, / which is an inverse system of the mixing system A, / as shown in this figure. (ポインタ指す) Multichannel BSS provides better separation quality / because we can use / “spatial information” / for estimating the demixing system W. Many algorithms have been proposed, / and the most succeeded methods are / independent vector analysis, / IVA, // independent low-rank matrix analysis, / ILRMA, // and time-frequency-masking-based BSS, / TFMBSS in short. On the other hand, / when the observed mixture is obtained as a monaural(モノーゥラル)format, the single-channel BSS / must be applied / to achieve the separation. This is more difficult problem compared with the multichannel BSS. In this presentation, / we only focus on the method called / “harmonic/percussive sound separation,” / HPSS in short.
  4. HPSS aims to separate / harmonic and percussive audio sources, / such as drums and vocals, / in a fully blind manner. Such technique(テクニーク)is crucial for many applications including music analysis, / for example, / estimation of chords, tempo, rhythm, notes, and genre(ジャヌルァ). Our research aim is / to propose a high-quality multichannel blind HPSS method, / which is useful for the applications we explained.
  5. To achieve the source separation in a fully blind manner, / an assumption of time-frequency structure in each source is required, / where this assumption is called the “source model(マドー).” For example, / in IVA, / we assume that / all the frequencies of each source / simultaneously have large power like this figure.(ポインタ指す) In ILRMA, / power spectrogram of each source / have a low-rank time-frequency structure. (ポインタ指す) Both IVA and ILRMA estimate the spatial demixing matrix, / which provides multichannel linear separation results. In HPSS, / we assume two source models.(マドース)(ポインタ指す) For the harmonic source, / a time-continuous structure is assumed, / whereas a frequency-continuous structure is assumed for the percussive source. Since HPSS is a single-channel BSS, / the separation mechanism is non-linear, / and the artificial distortions are sometimes generated.
  6. As a generalized multichannel linear BSS framework, / time-frequency-masking-based BSS, / TFMBSS in short, // has been proposed. In TFMBSS, / the source models(マドース)can be easily replaced / as a plug-and-play manner, / where the source model(マドー)is input as a time-frequency mask. This slide shows the optimization algorithm in TFMBSS. For the detail explanation of this algorithm, / please see the reference paper. In the fourth line, / we generate a time-frequency mask, / a source model,(マドー)/ based on the temporal estimated sources z.(ズィー) Then, / we apply the time-frequency masking in the fifth line / to update the estimated sources. Thus, / any kind of time-frequency mask can be used / as a source model(マドー)in this BSS framework, / and we can achieve the linear multichannel BSS / as well as IVA and ILRMA.
  7. Next, / I introduce the detail of HPSS. HPSS separates harmonic and percussive sounds / by focusing on the “smoothness” / along with the time / or frequency directions in the spectrogram. This is because / the harmonic components become the horizontal stripe patterns(パタンズ), and the percussive components become the vertical stripe patterns(パタンズ).(それぞれポインタで指す) HPSS optimizes the separated signal H and P / by iteratively minimizing this cost function(ポインタで指す), / which enhances the vertical smoothness of H / and the horizontal smoothness of P, respectively. The smoothness assumption in HPSS / is reasonable for separating harmonic and percussive sounds. However, / since this single-channel BSS is a non-linear process, / artificial distortions may be noticeable. To achieve a high-quality linear multichannel HPSS, / we propose to integrate conventional HPSS / with the TFMBSS framework.
  8. This is the process flow of the proposed linear multichannel HPSS. In our method, TFMBSS(ポインタで指しながら)estimates the linear demixing system / using iterative optimization. In each iteration of TFMBSS, / the time-frequency masks of harmonic and percussive sources are updated / by the processes shown in the above(アバブ,ポインタで指しながら). In each iteration of TFMBSS, / temporal estimates of harmonic and percussive sources, / ZH(ズィーエイチ)and ZP(ズィーピー), / are respectively input to HPSS blocks(ポインタで指しながら). Then we obtain H and P components for each of ZH and ZP(ポインタで指しながら). From these components, / we calculate a new time-frequency masks, / MH and MP, / that enhance harmonic and percussive components in ZH and ZP, / respectively(ポインタで指しながら). After that, / a smoothing process is applied with the old masks / calculated in the previous iteration. Finally, / the smoothed masks are returned to TFMBSS(ポインタで指しながら). This operation is iterated / until TFMBSS is converged. The estimated harmonic and percussive source signals are obtained by inverse STFT. In the next two slides, / we explain the details of “the mask calculation,” // here,(指しながら)// and “the smoothing process,” // here(指しながら).
  9. In the proposed algorithm, / two HPSS are independently applied to each of temporarily estimated signals, / ZH and ZP. From ZH, / we obtain H super H and P super H.(superとはsuperscriptの略で,日本語で言う「H上付きH」のような意味) Also, / from ZP, / we obtain H super P and P super P. Using these HPSS results, we construct two Wiener-like masks MH and MP / as shown in these equations. These masks enhance the harmonic or percussive components / by eliminating the other components.
  10. In TFMBSS, / since the optimization is performed by the primal-dual splitting algorithm, / the drastic change of the masks in each iteration / will cause instability of the parameter optimization. To avoid this problem, / we introduce / “mask smoothing process” / based on the weighted geometric mean, / like this calculation(指しながら), / where M in the left-hand side is a mask of the current iteration, / Mold is the mask of the previous iteration, / and this operation(指しながら)is an entrywise product. Beta and beta old are the parameters that control the intensity of smoothing, / and the sum of beta and beta old equals unity.
  11. After the mask calculation and smoothing processes, / we input the new mask to the TFMBSS. This mask update is iterated / until TFMBSS converges.
  12. To evaluate our proposed method, / we conducted BSS experiment(イクスペリメント). This slide shows the experimental conditions. For the dry sources, we used SiSEC2016 MUS dataset. In particular, / we used “drums” and “other” musical sources of 20 songs, / where “other” sources include various(ベァリアス)types of harmonic musical instruments, / such as guitar, synthesizer, and so on. These dry sources are convoluted / and spatially mixed with the impulse responses recorded in this condition(指しながら), / and we produced / two-channel / and two-source observed signals. As the evaluation score, / we used an improvement of source-to-distortion ratio, / SDR, / which shows both the degree of separation / and the sound quality of separated signals.
  13. We conducted three experiments. In the first experiment, / we investigated the optimal number of iterations in the HPSS blocks. Since HPSS is also an iterative optimization algorithm, / we compare the average SDR improvements of the proposed method / with various settings of HPSS iteration.
  14. This is the result of the first experiment. The horizontal axis shows the number of iterations in the HPSS blocks, / and the vertical axis shows the average SDR improvements. We can confirm that / the fewer iterations in the HPSS blocks is not preferable(指しながら), / and the performance is saturated for more than 9 iterations.(指しながら) For the other experiments, / we set the number of iterations in HPSS / to 15.
  15. In the second experiment, / we investigated the optimal smoothing parameter “beta old” / in the smoothing blocks. This parameter controls the intensity of mask smoothing. If we set “beta old” to zero, / no smoothing is applied. Again, / we compare the SDR improvements of the proposed method.
  16. This graph is an example of SDR behaviors of the proposed method, / namely, / the horizontal axis shows the global iteration of the proposed method. The colors show the intensity of smoothing process. The brightest color corresponds to / “beta old equals zero,” / no smoothing, // and the darkest color represents strong smoothing. We can confirm that / the smoothing process can stabilize the behavior of the SDR improvement, / although the convergence speed slows down. This behavior was common / for all the songs.
  17. This figure shows the average SDR results of 20 songs with 500 global iterations. The horizontal axis shows the value of “beta old.” The condition “beta old equals 0.75(ズィロポイントセブンファイブ,指しながら)” / achieves the highest performance. Thus, in the third experiment, / we used this value.
  18. In the final experiment, / we compared the proposed method with five conventional BSS methods, namely, single-channel HPSS, multichannel HPSS, AuxIVA(オーグズアイヴィーエー), and ILRMA.
  19. This is the average result of each method. Please note that / only the single-channel HPSS is a non-linear method. From this result, the proposed method greatly outperforms the other methods / on the average of 20 songs.
  20. Finally, we demonstrate some examples of blind HPSS. The observed signal is a mixture of drums and guitar. 混合→Single-channel HPSSのH→AuxIVAのH→ILRMAのH→ProposedのHの順で次々再生(セリフは不要) 続いてSingle-channel HPSSのP→AuxIVAのP→ILRMAのP→ProposedのPの順で次々再生(セリフは不要) ※時間が無ければILRMAはスキップしてもOK 再生が終わったら無言でまとめページにページ送り
  21. This is a conclusion. 時間があれば説明する That’s all. Thank you for your attention.
  22. こういった,音源ごとの特質をあらかじめ仮定するのではなく この音源モデル自体を推定する方法として 時間周波数マスクに基づくBSS TFMBSSというフレームワークを紹介します. この手法は,先ほど考えた低ランク性や軸方向の滑らかさなどを 時間周波数マスクという形で音源モデルとして仮定します. 時間周波数マスクというのは, 図のように,赤,青2つの音源からなる混合信号に対し 赤の音源の部分を1それ以外を0とするようなマスクであり, (クリック) これを要素ごとに適用することで赤の音源のみを取り出すことができます. これを音源モデルとしたのがTFMBSSであり 多チャネルBSSであるため線形分離が達成可能です.
  23. 先ほど紹介したHPSSとTFMBSSを用いた,卒業研究で提済みの従来手法のアルゴリズムをまず紹介します. まず観測信号をSTFTして,その後TFMBSSに取り込まれます. ここから生成された,調波成分に対応する中間変数zH,打撃成分に対応するzPのそれぞれが, 観測信号としてHPSSに取り込まれ調波成分と打撃成分に分離されます. そこからマスクを生成し,過去のマスクとスムージングを施し,新しくマスクを得ます. このスムージングはアルゴリズムを安定的に動かすための処理になります. これをTFMBSSに返すという動作を任意の反復回数繰り返した後, 逆STFTで時間信号に変換し,線形な打撃成分調波成分の音源得ます.
  24. 次に実験2における従来手法のある一曲の反復毎のSDRの推移を示します. 横軸がTFMBSSにおける全体の反復回数で,縦軸が音源分離性能になっています. 線の色が薄いほど反復間のマスクの変化が大きく線の色が濃いほどマスクの変化が小さいです. スムージングをかけなくともほぼ安定してSDRが収束しています.逆にかけすぎると収束速度,収束値が減衰する傾向にあります.
  25. そして,音源分離という信号処理分野においての一般的な変換について解説します. 観測した時間信号から任意のフーリエ変換長分,離散フーリエ変換し一本のベクトルを生成します.(クリック)そしてもう一本(クリック)もう一本(クリック)というように 時間軸上に並べることで複素数要素を持った時間周波数表現であるスペクトログラムが生成されます.音源分離においては,このスペクトログラムを信号処理の対象とするのが一般的です. 次に,音源分離の中でも研究が盛んであるブラインド音源分離について解説します.
  26. スムージングの詳細