SlideShare a Scribd company logo
1 of 20
DNN-based frequency component
prediction for frequency-domain
audio source separation
Rui Watanabe, Daichi Kitamura (National Institute of Technology, Japan)
Hiroshi Saruwatari (The University of Tokyo, Japan)
Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan)
28th European Signal Processing Conference (EUSIPCO) SS-2.4
Background
 Audio source separation
– aims to separate audio sources such as speech, singing
voice, and musical instruments.
 Products with audio source separation
– Intelligent speaker
– Hearing-aid system
– Music editing by users etc.
1
Background
 Multichannel audio source separation (MASS)
– estimates separation system using multichannel
observation without knowing mixing system
 Popular methods for each condition
– Underdetermined (number of mics < number of sources)
• Multichannel nonnegative matrix factorization (MNMF) [Sawada+, 2013]
• Approaches based on deep neural networks (DNNs)
– Overdetermined (number of mics ≥ number of sources)
• Frequency-domain Independent component analysis [Smaragdis, 1998]
• Independent vector analysis [Kim+, 2007]
• Independent low-rank matrix (ILRMA) [Kitamura+, 2016]
2
Background
 Frequency-domain MASS
– performs a short-time Fourier transform to the
observed time-domain signal to obtain the
spectrograms
– estimates frequency-wise separation filter
3
Conventional frequency-domain MASS
 Multichannel nonnegative matrix factorization
(MNMF) [Sawada+, 2013]
– Unsupervised source separation algorithm without any
prior information or training
– High quality MASS can be achieved
– Huge computational cost for estimating the
parameters
4
Proposed method: motivation
 High-quality MASS with low computational cost
 A new framework combining frequency-domain MASS and
DNN
– Separate specific frequencies via MNMF and obtain separated
source components
– The estimated source components of the other frequencies will
be predicted by DNN
5
Proposed method: interpretation of DNN
 DNN in proposed framework can be interpreted in two
ways
1. Audio source separation of specific frequencies (high-
frequency band)
• Low-frequency bands can be used for predicting high-frequency separated
components
2. Audio bandwidth expansion of each source
• High-frequency band of the mixture is a strong cue for expanding bandwidth
6
Proposed method: details of framework
 Observed multichannel spectrograms and are
divided into low- and high-frequency bands
 Apply MNMF to the low frequency band and to
obtain the separated source components and
– High-frequency band and are not separated in
this step
7
Proposed method: details of framework
 Input , , and to DNN
– DNN outputs softmasks and such that the high-
frequency bands and are estimated from
8
Apply softmasks
Proposed method: input vector of DNN
 DNN prediction is performed for each time frame
(each column of spectrograms)
– Input vector is a concatenation of several time frames
around th frame in , , and
– Normalize the concatenated vector
9
Proposed method: DNN architecture
 Simple full-connected networks
– Four hidden layers with Swish or Softmax functions
10
Experiment 1: bandwidth expansion
 Validation of the proposed framework
– Evaluate bandwidth expansion performance from the low-
frequency band of true sources with/without mixture
– Confirm validity of the proposed framework that utilizes
mixture components for predicting the separated sources
– Use sources-to-artifact ratio (SAR) [Vincent+, 2006]
11
Experiment 1: bandwidth expansion
 Training conditions of DNN
 Test dataset (SiSEC2011) [Araki+, 2012] for evaluation
12
Training dataset
100 drums (Dr.) and vocal (Vo.) songs in SiSEC2016
Database [Liutkus+, 2016]
FFT length/Shift length 128 ms/64 ms
Boundary frequency 4 kHz (Half of Nyquist frequency)
Epochs/batch size 1000/128
Optimizer Adam (learning rate=0.001)
Song ID Song name Signal length [s]
1 dev1__bearlin-roads (Dr. & Vo.) 14.0
2 dev2__another_dreamer-the_ones_we_love (Dr. & Vo.) 25.0
3 dev2__fort_minor-remember_the_name (Dr. & Vo.) 24.0
4 dev2_ultimate_nz_tour (Dr. & Vo.) 18.0
Experiment 1: bandwidth expansion
 Mixture components help to predict the high-
frequency band of the separated sources
13
Song ID DNN w/o mixture DNN w/ mixture
1
Dr. : 21.1 dB Dr. : 28.0 dB
Vo. : 21.8 dB Vo. : 31.5 dB
2
Dr. : 22.0 dB Dr. : 21.8 dB
Vo. : 12.7 dB Vo. : 19.6 dB
3
Dr. : 15.0 dB Dr. : 20.4 dB
Vo. : 11.2 dB Vo. : 18.5 dB
4
Dr. : 11.0 dB Dr. : 18.2 dB
Vo. : 10.4 dB Vo. : 15.3 dB
Experiment 2: evaluate proposed MASS framework
 Compare conventional fullband MNMF and the
proposed framework
– In terms of separation accuracy (source-to-distortion
ratio: SDR [Vincent+, 2006]) and computational efficiency
14
Experiment 2: evaluate proposed MASS framework
 Experimental conditions of MNMF
15
Multichannel observed
signal
Produce two-channel mixture by convoluting
E2A impulse responses to the sources of the
test dataset
Boundary frequency 4 kHz
Number of bases in MNMF 13
16
Experiment 2: evaluate proposed MASS framework
 Song ID 4
– Since the number of frequencies is reduced by half,
the proposed method is twice faster
– In Fullband MNMF, 13dB
was achieved in 120s
– Proposed method
achieved 13 dB in less
than 50s
17
Experiment 2: evaluate proposed MASS framework
18
Experiment 2: evaluate proposed MASS framework
Conclusion
 In this paper
– We proposed a computationally efficient audio source
separation framework combined frequency-domain
MASS and frequency component prediction based on
DNN
– In the proposed framework, MASS is applied to only
the limited frequencies, and DNN predicts the other
frequency components of the sources
– By comparing fullband MNMF, the proposed method
can achieve almost the same quality with the half-
reduced computational cost
19
Thank you for your attention!

More Related Content

What's hot

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...奈良先端大 情報科学研究科
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Daichi Kitamura
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Daichi Kitamura
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...奈良先端大 情報科学研究科
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Daichi Kitamura
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用Kitamura Laboratory
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systemsNamratha Dcruz
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
 

What's hot (20)

Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...Experimental analysis of optimal window length for independent low-rank matri...
Experimental analysis of optimal window length for independent low-rank matri...
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
 
Ica2016 312 saruwatari
Ica2016 312 saruwatariIca2016 312 saruwatari
Ica2016 312 saruwatari
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...Regularized superresolution-based binaural signal separation with nonnegative...
Regularized superresolution-based binaural signal separation with nonnegative...
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
 
Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016
 
Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...Blind source separation based on independent low-rank matrix analysis and its...
Blind source separation based on independent low-rank matrix analysis and its...
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 
Apsipa2016for ss
Apsipa2016for ssApsipa2016for ss
Apsipa2016for ss
 
Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...Superresolution-based stereo signal separation via supervised nonnegative mat...
Superresolution-based stereo signal separation via supervised nonnegative mat...
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
コサイン類似度罰則条件付き半教師あり非負値行列因子分解と音源分離への応用
 
Dsp2015for ss
Dsp2015for ssDsp2015for ss
Dsp2015for ss
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 

Similar to DNN-based frequency component prediction for frequency-domain audio source separation

DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionNAVER Engineering
 
Sound event detection using deep neural networks
Sound event detection using deep neural networksSound event detection using deep neural networks
Sound event detection using deep neural networksTELKOMNIKA JOURNAL
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...Hiroki_Tanji
 
Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...journalBEEI
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET Journal
 
RACE1059 paper.pdf
RACE1059 paper.pdfRACE1059 paper.pdf
RACE1059 paper.pdfsurbhi jha
 
Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...Made Artha
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdfIjictTeam
 
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...ijwmn
 
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...ijwmn
 

Similar to DNN-based frequency component prediction for frequency-domain audio source separation (20)

DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
Sound event detection using deep neural networks
Sound event detection using deep neural networksSound event detection using deep neural networks
Sound event detection using deep neural networks
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoising
 
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
A Generalization of Laplace Nonnegative Matrix Factorizationand Its Multichan...
 
Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...
 
T26123129
T26123129T26123129
T26123129
 
COGNITIVE RADIO
COGNITIVE RADIOCOGNITIVE RADIO
COGNITIVE RADIO
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
 
Lecture 9
Lecture 9Lecture 9
Lecture 9
 
RACE1059 paper.pdf
RACE1059 paper.pdfRACE1059 paper.pdf
RACE1059 paper.pdf
 
Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdf
 
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
 
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
MULTI-STAGES CO-OPERATIVE/NONCOOPERATIVE SCHEMES OF SPECTRUM SENSING FOR COGN...
 
Aa04606162167
Aa04606162167Aa04606162167
Aa04606162167
 
Final presentation
Final presentationFinal presentation
Final presentation
 
N017428692
N017428692N017428692
N017428692
 
Topic: Spread Spectrum
Topic: Spread SpectrumTopic: Spread Spectrum
Topic: Spread Spectrum
 

More from Kitamura Laboratory

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定Kitamura Laboratory
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定Kitamura Laboratory
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムKitamura Laboratory
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離Kitamura Laboratory
 
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Kitamura Laboratory
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法Kitamura Laboratory
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Kitamura Laboratory
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価Kitamura Laboratory
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討Kitamura Laboratory
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,Kitamura Laboratory
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討Kitamura Laboratory
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測Kitamura Laboratory
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析Kitamura Laboratory
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離Kitamura Laboratory
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離Kitamura Laboratory
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧Kitamura Laboratory
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測Kitamura Laboratory
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化Kitamura Laboratory
 
独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システムKitamura Laboratory
 
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価Kitamura Laboratory
 

More from Kitamura Laboratory (20)

付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
付け爪センサによる生体信号を用いた深層学習に基づく心拍推定
 
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定STEM教育を目的とした動画像処理による二重振り子の軌跡推定
STEM教育を目的とした動画像処理による二重振り子の軌跡推定
 
ギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズムギタータブ譜からのギターリフ抽出アルゴリズム
ギタータブ譜からのギターリフ抽出アルゴリズム
 
時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離時間微分スペクトログラムに基づくブラインド音源分離
時間微分スペクトログラムに基づくブラインド音源分離
 
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
 
周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法周波数双方向再帰に基づく深層パーミュテーション解決法
周波数双方向再帰に基づく深層パーミュテーション解決法
 
Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...Heart rate estimation of car driver using radar sensors and blind source sepa...
Heart rate estimation of car driver using radar sensors and blind source sepa...
 
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
双方向LSTMによるラウドネス及びMFCCからの振幅スペクトログラム予測と評価
 
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
深層ニューラルネットワークに基づくパーミュテーション解決法の基礎的検討
 
多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,多重解像度時間周波数表現に基づく独立低ランク行列分析,
多重解像度時間周波数表現に基づく独立低ランク行列分析,
 
深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討深層パーミュテーション解決法の基礎的検討
深層パーミュテーション解決法の基礎的検討
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
 
音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析音楽信号処理における基本周波数推定を応用した心拍信号解析
音楽信号処理における基本周波数推定を応用した心拍信号解析
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
 
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
コサイン類似度罰則条件付き非負値行列因子分解に基づく音楽音源分離
 
非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧非負値行列因子分解を用いた被り音の抑圧
非負値行列因子分解を用いた被り音の抑圧
 
独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測独立成分分析に基づく信号源分離精度の予測
独立成分分析に基づく信号源分離精度の予測
 
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
深層学習に基づく間引きインジケータ付き周波数帯域補間手法による音源分離処理の高速化
 
独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム独立低ランク行列分析を用いたインタラクティブ音源分離システム
独立低ランク行列分析を用いたインタラクティブ音源分離システム
 
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
局所時間周波数構造に基づく深層パーミュテーション解決法の実験的評価
 

Recently uploaded

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 

Recently uploaded (20)

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 

DNN-based frequency component prediction for frequency-domain audio source separation

  • 1. DNN-based frequency component prediction for frequency-domain audio source separation Rui Watanabe, Daichi Kitamura (National Institute of Technology, Japan) Hiroshi Saruwatari (The University of Tokyo, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) 28th European Signal Processing Conference (EUSIPCO) SS-2.4
  • 2. Background  Audio source separation – aims to separate audio sources such as speech, singing voice, and musical instruments.  Products with audio source separation – Intelligent speaker – Hearing-aid system – Music editing by users etc. 1
  • 3. Background  Multichannel audio source separation (MASS) – estimates separation system using multichannel observation without knowing mixing system  Popular methods for each condition – Underdetermined (number of mics < number of sources) • Multichannel nonnegative matrix factorization (MNMF) [Sawada+, 2013] • Approaches based on deep neural networks (DNNs) – Overdetermined (number of mics ≥ number of sources) • Frequency-domain Independent component analysis [Smaragdis, 1998] • Independent vector analysis [Kim+, 2007] • Independent low-rank matrix (ILRMA) [Kitamura+, 2016] 2
  • 4. Background  Frequency-domain MASS – performs a short-time Fourier transform to the observed time-domain signal to obtain the spectrograms – estimates frequency-wise separation filter 3
  • 5. Conventional frequency-domain MASS  Multichannel nonnegative matrix factorization (MNMF) [Sawada+, 2013] – Unsupervised source separation algorithm without any prior information or training – High quality MASS can be achieved – Huge computational cost for estimating the parameters 4
  • 6. Proposed method: motivation  High-quality MASS with low computational cost  A new framework combining frequency-domain MASS and DNN – Separate specific frequencies via MNMF and obtain separated source components – The estimated source components of the other frequencies will be predicted by DNN 5
  • 7. Proposed method: interpretation of DNN  DNN in proposed framework can be interpreted in two ways 1. Audio source separation of specific frequencies (high- frequency band) • Low-frequency bands can be used for predicting high-frequency separated components 2. Audio bandwidth expansion of each source • High-frequency band of the mixture is a strong cue for expanding bandwidth 6
  • 8. Proposed method: details of framework  Observed multichannel spectrograms and are divided into low- and high-frequency bands  Apply MNMF to the low frequency band and to obtain the separated source components and – High-frequency band and are not separated in this step 7
  • 9. Proposed method: details of framework  Input , , and to DNN – DNN outputs softmasks and such that the high- frequency bands and are estimated from 8 Apply softmasks
  • 10. Proposed method: input vector of DNN  DNN prediction is performed for each time frame (each column of spectrograms) – Input vector is a concatenation of several time frames around th frame in , , and – Normalize the concatenated vector 9
  • 11. Proposed method: DNN architecture  Simple full-connected networks – Four hidden layers with Swish or Softmax functions 10
  • 12. Experiment 1: bandwidth expansion  Validation of the proposed framework – Evaluate bandwidth expansion performance from the low- frequency band of true sources with/without mixture – Confirm validity of the proposed framework that utilizes mixture components for predicting the separated sources – Use sources-to-artifact ratio (SAR) [Vincent+, 2006] 11
  • 13. Experiment 1: bandwidth expansion  Training conditions of DNN  Test dataset (SiSEC2011) [Araki+, 2012] for evaluation 12 Training dataset 100 drums (Dr.) and vocal (Vo.) songs in SiSEC2016 Database [Liutkus+, 2016] FFT length/Shift length 128 ms/64 ms Boundary frequency 4 kHz (Half of Nyquist frequency) Epochs/batch size 1000/128 Optimizer Adam (learning rate=0.001) Song ID Song name Signal length [s] 1 dev1__bearlin-roads (Dr. & Vo.) 14.0 2 dev2__another_dreamer-the_ones_we_love (Dr. & Vo.) 25.0 3 dev2__fort_minor-remember_the_name (Dr. & Vo.) 24.0 4 dev2_ultimate_nz_tour (Dr. & Vo.) 18.0
  • 14. Experiment 1: bandwidth expansion  Mixture components help to predict the high- frequency band of the separated sources 13 Song ID DNN w/o mixture DNN w/ mixture 1 Dr. : 21.1 dB Dr. : 28.0 dB Vo. : 21.8 dB Vo. : 31.5 dB 2 Dr. : 22.0 dB Dr. : 21.8 dB Vo. : 12.7 dB Vo. : 19.6 dB 3 Dr. : 15.0 dB Dr. : 20.4 dB Vo. : 11.2 dB Vo. : 18.5 dB 4 Dr. : 11.0 dB Dr. : 18.2 dB Vo. : 10.4 dB Vo. : 15.3 dB
  • 15. Experiment 2: evaluate proposed MASS framework  Compare conventional fullband MNMF and the proposed framework – In terms of separation accuracy (source-to-distortion ratio: SDR [Vincent+, 2006]) and computational efficiency 14
  • 16. Experiment 2: evaluate proposed MASS framework  Experimental conditions of MNMF 15 Multichannel observed signal Produce two-channel mixture by convoluting E2A impulse responses to the sources of the test dataset Boundary frequency 4 kHz Number of bases in MNMF 13
  • 17. 16 Experiment 2: evaluate proposed MASS framework
  • 18.  Song ID 4 – Since the number of frequencies is reduced by half, the proposed method is twice faster – In Fullband MNMF, 13dB was achieved in 120s – Proposed method achieved 13 dB in less than 50s 17 Experiment 2: evaluate proposed MASS framework
  • 19. 18 Experiment 2: evaluate proposed MASS framework
  • 20. Conclusion  In this paper – We proposed a computationally efficient audio source separation framework combined frequency-domain MASS and frequency component prediction based on DNN – In the proposed framework, MASS is applied to only the limited frequencies, and DNN predicts the other frequency components of the sources – By comparing fullband MNMF, the proposed method can achieve almost the same quality with the half- reduced computational cost 19 Thank you for your attention!

Editor's Notes

  1. Hi everyone, I’m Rui Watanabe / from National Institute of Technology(テクナーロジィ), / Kagawa College, / Japan. I’m gonna talk about / DNN-based frequency component prediction / for frequency-domain audio source separation.
  2. Audio source separation / is a technique(テクニーク)to separate audio sources / such as speech,↑ / singing voice,↑ / musical instruments↑, and so on↓. This technology(テクナーロジィ)can be used for many products / including an intelligent speakers,↑/ hearing-aid systems,↑ / and music editing by users↓.
  3. In particular, / multichannel(マーチチャネル)audio source separation, / MASS(エムエイエスエス)in short, / estimates a separation system W / using multichannel(マーチチャネル) observation / without knowing the mixing system A.(WとAは指し示しながら) This technique(テクニーク)can be divided into two categories(キャテゴリーズ), / for underdetermined / and overdetermined situations(スィテュエイションズ). The underdetermined situation(スィテュエイション) is that / the number of microphones / is less than the number of sources in the mixture. For this case, multichannel(マーチチャネル)nonnegative matrix(メイトリクス)factorization, / MNMF in short, / is a popular(パピュラー)algorithm. Also, / many DNN-based approaches / have been proposed so far in this case. On the other hand, / in the overdetermined situation(スィテュエイション), / the number of microphones is equal to / or larger than the number of sources. In this case, / frequency-domain independent component analysis / and independent low-rank matrix(メイトリクス)analysis / are the most reliable approaches.
  4. In this presentation,↑/ we only treat “frequency-domain MASS”. In this algorithm↑, / we perform / a short-time Fourier transform to the observed time-domain signal / and obtain the multichannel(マーチチャネル)spectrograms.(図の紫部分を指しながら) Then, / we estimate a frequency-wise separation filter, / which is applied to each frequency like this(図の中央を指しながら)/ to estimate the separated source signals.
  5. Let me introduce the conventional frequency-domain MASS called “multichannel(マーチチャネル)nonnegative matrix(メイトリクス)factorization,” / MNMF in short. This is an unsupervised source separation algorithm / and does not require any prior(プライォア)information or training(テュレイニン). As an unsupervised technique(テクニーク), MNMF tends to provide high quality separation performance. In MNMF, / the observed multichannel signal / is represented by the time-frequency-wise channel correlation matrices / denoted by X. Since X is a frequency-by-time matrix whose element is a channel-by-channel matrix↑, / this is a matrix of matrices, / which is a fourth-order tensor(テンサー). (frequency-by-timeのところは指し示しながら) MNMF decomposes X / into the source-wise spatial model(マドー)/ and the low-rank spectral model(マドー)of all the sources. Thus, / by clustering the spectral model(マドー)into each source using the estimated spatial model(マドー), the source separation is achieved. However, it requires a huge computational cost / for estimating the parameters(パラミターズ)/ because there are so many parameters(パラミターズ)in this model(マドー).
  6. In this presentation, / our motivation(モーティベイシュン) is that / we want to achieve a high-quality MASS / with a low computational cost. And we propose / a new source separation framework / combining frequency-domain MASS / and deep neural networks. In this framework, / as an initial process, / the mixture signal in specific frequencies are separated by MNMF, / and we obtain the separated source components in that frequencies. In this figure, / since only the low-frequency band of the mixture is input to MNMF, we can get the separated components in the low-frequency band. Of course, / the high-frequency bands of the separated sources / are missing. (しっかり間を開ける) As a post process, / we apply DNN-based frequency component prediction, / namely, / the missing high-frequency bands of the separated sources are predicted by DNN, / where we input not only the separated low-frequency bands / but also the mixture of the high-frequency band.(inputの矢印をそれぞれ指し示しながら) Since the DNN prediction process is much faster than MNMF process, / we can reduce / the total computational cost in this framework. For example, if we divide the frequency bands in half like figure,↑/ we can reduce the computational time / almost half.
  7. In our framework, / the post DNN process can be interpreted in two ways. First, / the DNN is an audio source separation of specific frequencies, / high-frequency band in this figure. Please note that / the low-frequency bands can be used for predicting high-frequency separated components in our DNN model(マドー). Second, / the DNN seems to be a bandwidth expansion of each source / because the high-frequency bands are predicted. In general, a bandwidth expansion is a hard task / even for DNN. However, / in our model,(マドー)/ the high-frequency band of the mixture / becomes a strong cue / to achieve the bandwidth expansion.
  8. The details of the proposed method is as follows. First, / the observed multichannel spectrograms / M1 and M2 / are divided into low- and high-frequency bands. Then, / we apply MNMF to only the low-frequency band / M1(L) and M2(L) / to obtain the separated source components / Y1(L) and Y2(L). The high-frequency band / M1(H) and M2(H) / are not separated in this step.
  9. Next, / we input the high-frequency band of the mixture / and the low-frequency bands of the separated sources / like this figure. The DNN / outputs two soft-masks / W1 and W2 / such that the high-frequency bands of the separated sources are calculated from M1(H) / by multiplying them. Of course, / the masks are the matrices with the elements between zero(ジロー)and one, / and the sum of each element in W1 and W2 is always unity.
  10. The DNN prediction is performed for each time frame j, / which is / each column of spectrograms. To utilize the information along time in the prediction, / the input vector for DNN is a concatenation of several time frames around j in the mixture and the separated sources. Also, / before we input the vector to the DNN, we normalize it to stabilize the model(マドー)training(テュレイニン), / where the normalization coefficient is added / to keep the information of the signal volume.
  11. The DNN model(マドー)in the proposed method is very simple. We have full-connected four hidden layers, / and we apply Swish function / to each hidden layer. Just before the output, / we apply frequency-wise Softmax function, / to ensure(インシュァー)that / the sum of the masks equals unity in each frequency. The mean squared error / between the separated source vector and the label(レイボーゥ)vector↑/ is used as a loss function of the DNN training(テュレイニン).
  12. To confirm the validity of the proposed method, / we have done two experiments(イクスペリメンツ). In the first experiment, / we evaluate the performance of the DNN model(マドー)/ as the bandwidth expansion. That is, / the DNN restores the high-frequency band from the low-frequency band of the completely separated sources, / where we confirm whether the high-frequency band of the mixture is effective / by comparing these two models(マドーズ).(図を指しながら) Therefore, / we can confirm the validity of the proposed framework / that utilizes mixture components / for predicting the separated sources. As an evaluation score, / we use sources-to-artifact ratio(レイシオ), / SAR, / which shows the absence of artificial distortions / in the estimated audio signals.
  13. This slide shows experimental conditions. For the training of DNN, / we used 100 songs with drums and vocals / in the SiSEC2016 database.(トゥエンティシックスティーンと発音) The boundary frequency between low- and high-frequency bands / was set to 4kHz, / which is a half of Nyquist frequency. As the test dataset, / we used four songs included in the SiSEC2011 database(トゥエンティイレブンと発音), / where these songs are the mixture of drums and vocals.
  14. This is the result of bandwidth expansion. For each song, / we showed the SAR values of Drums and Vocals. Higher SAR indicates better audio quality. Two columns show the results of DNN without mixture / and DNN with mixture. In almost all results, / the DNN with mixture outperforms the DNN without mixture. From this result, / we can confirm that / the mixture components help to predict the high-frequency band of the separated sources. Thus, / we can expect that / the proposed framework will perform effectively / in a source separation task.
  15. Next, / we conducted the MASS experiment. We compare the conventional MNMF and the proposed framework. The conventional method separates fullband mixture by MNMF / whereas the proposed framework separates only the low-frequency band by MNMF, / and the high-frequency band is predicted by DNN post process. We expect that / the computational time is reduced by skipping half number of frequencies in the MNMF process / while the separation performance is almost the same. As a source separation score, / we used source-to-distortion ratio(レイシオ), / SDR, / which represents the total performance of source separation including both “degree of separation” and the “quality of separated signals.”
  16. The other conditions are shown in this slide.(ここのtheは重要,otherとthe otherは違う) DNN is trained using the same dataset in the previous experiment. For the MASS test data, / we produced two-channel observed mixtures / by convoluting E2A impulse responses to the Drums and Vocals sources of the test dataset, / where the recording condition of E2A impulse response is depicted here.(図を指しながら) The reverberation time of E2A is 300 ms. The number of bases in MNMF was set to 13, / which provides the best result for both the conventional and proposed methods.
  17. This is the result for each song. The vertical axis indicates SDR score / averaged over 10 random initial values.(指しながら) The horizontal axis shows the average elapsed time.(指しながら) The black line is the conventional method, / fullband MNMF, / and the red circles are the results of the proposed framework. Since the elapsed time depends on the number of iterations of parameter update in MNMF,↑/ for the proposed framework,↑/ we plot the results with every 10 iterations in the MNMF process. Of course, / the computational time for the DNN prediction process / is included in each red circle, / although the DNN process requires less than 0.1 s(ジロポイントワンセカンズ). In all the results, / we can confirm the efficacy of the proposed method. In particular, / Song ID 4 shows the result just as we expected, / so let me explain the result of Song ID 4.
  18. In the case of Song ID 4, / the proposed method achieves 13 dB / in less than 50 s, / whereas fullband MNMF converged to 13 dB in 120 s. This is because / the number of frequencies in MNMF is reduced by half.
  19. In addition, / the proposed method outperforms fullband MNMF in Song IDs 1, 2, and 4. In particular, / the improvement in Song ID 1 is very large. The reason of these improvements might be that / the proposed method performed more accurate estimation of high-frequency band sources / based on the training with 100 songs. Also, / in the case of Song ID 1, / fullband MNMF might be trapped into a bad local minimum during the iterative optimization.