SlideShare a Scribd company logo
1 of 12
Stereophonic Music Separation
Based on Non-negative Tensor Factorization
with Cepstrum Regularization
Shogo Seki, Tomoki Toda, Kazuya Takeda
(Nagoya University, Japan)
AASP-L4
Background
 Music signals in CDs or streaming media
‐ Composed of many source signals
(e.g. bass, drum set, vocals)
‐ Represented as a two-channel (stereophonic) signal
 Source separation for music signals
‐ Automatic music transcription [Smaragids+03]
‐ Source localization [Ohtani+16]
‐ Vocal extraction [Vembu&Baumann05, Ikemiya+15]
Stereophonic music signal separation
L R
1
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
What is a stereophonic music signal?
 Two-channel signal
→ Multichannel signal processing
 Contains many source signals
(# of channel signals) < (# of source signals)
→ Underdetermined Blind Source Separation (BSS) problem
 Usually manually synthesized (e.g. CD music)
‐ Individual source signals recorded separately
→ mixed with gain controls (i.e. panning)
‐ Pseudo spatial information:
 No valid phase information for the separation
→ Use only magnitude spectrum information
2
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
Research purpose
 Develop stereophonic music signal separation method
BSS methods Signal(s) Condition Spatial cues
IVA
[Kim06]
Multichannel Overdetermined Use
NMF
[Lee&Seung99]
Single Underdetermined None
MNMF
[Sawada+13]
Multichannel Underdetermined Use
ILRMA
[Kitamura+16]
Multichannel Overdetermined Use
Proposed Multichannel Underdetermined None
3
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
Modeling music generation process
1. Multi-channel signal:
Linear combinations of
sources with mixing gains
① Panning operation
2. Source signals:
Low-rank structures in
magnitude spectral domain
② NMF decomposition
Breaking spectrogram into:
- Spectral patterns
- Time-varying gains
SourcesGain
z
Basis ActivationGain
① Panning operation
② NMF decomposition
Magnitude
spectrograms
Multi-ch. signal
NTF framework
(Tensor decomposition)
4
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
Utilizing source information
 Data accessibility
‐ Hard to prepare actual source information of target signals
‐ Possible to utilize similar source information
 Supervised separation framework [Smaragdis+07]
‐ Learn basis spectra from training data
‐ Use them as;
‐ Fixed value (Supervised)
‐ Initial value (Lightly supervised)
SourcesGain
z
Basis ActivationGainMulti-ch. signal
5
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
Training
data
Fix or initialize
Regularization for source timbre
 Cepstral Distance Regularization [Li+16]
‐ Used in Semi-supervised speech enhancements
‐ Jointly enhance both spectrogram & features (MFCCs)
→ Constrains spectral envelopes (timbre information) of the sources
 Summary of the proposed method
GMM
modeling
MFCC
extraction
Training
data
6
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
SourcesGain
z
Basis ActivationGainMulti-ch. signal
Fix or initializePrior information
 Objective function (to be minimized)
‐ : KL-divergence b/w observation & estimate
‐ : Regularization parameter
‐ : Cepstral Distance Regularization term
(Negative log-likelihood of GMMs for MFCCs of sources)
→ Can be optimized by auxiliary function method
Formulation
7
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
SourcesGain
z
Basis ActivationGainMulti-ch. signal
Extracted MFCC sequences
× log
Mel-filterbank
IDCT
Experimental evaluation
 Investigation
‐ Effect of regularization
‐ Effect of supervised & lightly-supervised separation performance
• Updating: lightly-supervised (SS)
• Fixing: Supervised (S)
 Performance measurements (Larger is better)
‐ SDR (Signal-to-distortion ratio): sound quality
‐ SIR (Signal-to-Interfere ratio): suppression of non-target
‐ SAR (Signal-to-Artificial ratio): distortion through the process
 3 songs (of 1 artist) in Cambridge Music Technology
‐ 2 songs : training (Dictionary/GMM for regularization)
‐ 1 song : evaluation (30 - 45 s) & development (GMM parameter)
8
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
 Mixing setting
# of sources 3
Mixing gains (L:R)
(Following figure)
2:1 (Ba)
1:2 (Dr)
1:1 (Vo)
Sampling frequency 16 kHz
Frame size 32 ms
Shift size 16 ms
# of basis vectors/source 50
# of iterations (parameter updating) 400
# of mel-filter banks 64
Experimental conditions
Left Center Right
VoBa Dr
9
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
Stronger regularization
Results
Betterperformance
w/o regularization
 Comparison w/ or w/o regularization
‐ Better performance w/ regularization
→ Effective constraint on timbre
 Comparison b/w semi-/supervised
‐ Large improvement in semi-supervised
w/ regularization
→ Effective mismatch compensation
 Effect of hyperparameter setting
‐ Optimum setting shared for any sources
‐ Need to be optimized manually
→ Hyperparameter setting to be tuned
10
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
Conclusion
 Proposed stereophonic music signal separation method
‐ NTF-based decomposition
‐ Panning operation for observed multi-channel signal
‐ Low-rankness for the source spectrograms
‐ Supervised separation framework
‐ Regularization on timbre of individual sources by CDR
 Demonstrated effectiveness in supervised framework
‐ SS w/o reg. < S w/o reg. < S w/ reg. < SS w/ reg.
‐ Better separation performance w/ the regularization
 Future works
‐ Hyperparameter setting investigation
‐ Investigation for other various music sources
Thank you for the listening!
11
EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4

More Related Content

Similar to Stereophonic Music Separation Based on Non-negative Tensor Factorization with Cepstrum Regularization

SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
chakravarthy Gopi
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
威華 王
 

Similar to Stereophonic Music Separation Based on Non-negative Tensor Factorization with Cepstrum Regularization (8)

IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
 
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesisAPSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016Koyama AES Conference SFC 2016
Koyama AES Conference SFC 2016
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
 

Recently uploaded

一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
A
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
Kira Dess
 

Recently uploaded (20)

5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdf
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
DBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptxDBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptx
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 

Stereophonic Music Separation Based on Non-negative Tensor Factorization with Cepstrum Regularization

  • 1. Stereophonic Music Separation Based on Non-negative Tensor Factorization with Cepstrum Regularization Shogo Seki, Tomoki Toda, Kazuya Takeda (Nagoya University, Japan) AASP-L4
  • 2. Background  Music signals in CDs or streaming media ‐ Composed of many source signals (e.g. bass, drum set, vocals) ‐ Represented as a two-channel (stereophonic) signal  Source separation for music signals ‐ Automatic music transcription [Smaragids+03] ‐ Source localization [Ohtani+16] ‐ Vocal extraction [Vembu&Baumann05, Ikemiya+15] Stereophonic music signal separation L R 1 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
  • 3. What is a stereophonic music signal?  Two-channel signal → Multichannel signal processing  Contains many source signals (# of channel signals) < (# of source signals) → Underdetermined Blind Source Separation (BSS) problem  Usually manually synthesized (e.g. CD music) ‐ Individual source signals recorded separately → mixed with gain controls (i.e. panning) ‐ Pseudo spatial information:  No valid phase information for the separation → Use only magnitude spectrum information 2 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
  • 4. Research purpose  Develop stereophonic music signal separation method BSS methods Signal(s) Condition Spatial cues IVA [Kim06] Multichannel Overdetermined Use NMF [Lee&Seung99] Single Underdetermined None MNMF [Sawada+13] Multichannel Underdetermined Use ILRMA [Kitamura+16] Multichannel Overdetermined Use Proposed Multichannel Underdetermined None 3 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
  • 5. Modeling music generation process 1. Multi-channel signal: Linear combinations of sources with mixing gains ① Panning operation 2. Source signals: Low-rank structures in magnitude spectral domain ② NMF decomposition Breaking spectrogram into: - Spectral patterns - Time-varying gains SourcesGain z Basis ActivationGain ① Panning operation ② NMF decomposition Magnitude spectrograms Multi-ch. signal NTF framework (Tensor decomposition) 4 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
  • 6. Utilizing source information  Data accessibility ‐ Hard to prepare actual source information of target signals ‐ Possible to utilize similar source information  Supervised separation framework [Smaragdis+07] ‐ Learn basis spectra from training data ‐ Use them as; ‐ Fixed value (Supervised) ‐ Initial value (Lightly supervised) SourcesGain z Basis ActivationGainMulti-ch. signal 5 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4 Training data Fix or initialize
  • 7. Regularization for source timbre  Cepstral Distance Regularization [Li+16] ‐ Used in Semi-supervised speech enhancements ‐ Jointly enhance both spectrogram & features (MFCCs) → Constrains spectral envelopes (timbre information) of the sources  Summary of the proposed method GMM modeling MFCC extraction Training data 6 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4 SourcesGain z Basis ActivationGainMulti-ch. signal Fix or initializePrior information
  • 8.  Objective function (to be minimized) ‐ : KL-divergence b/w observation & estimate ‐ : Regularization parameter ‐ : Cepstral Distance Regularization term (Negative log-likelihood of GMMs for MFCCs of sources) → Can be optimized by auxiliary function method Formulation 7 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4 SourcesGain z Basis ActivationGainMulti-ch. signal Extracted MFCC sequences × log Mel-filterbank IDCT
  • 9. Experimental evaluation  Investigation ‐ Effect of regularization ‐ Effect of supervised & lightly-supervised separation performance • Updating: lightly-supervised (SS) • Fixing: Supervised (S)  Performance measurements (Larger is better) ‐ SDR (Signal-to-distortion ratio): sound quality ‐ SIR (Signal-to-Interfere ratio): suppression of non-target ‐ SAR (Signal-to-Artificial ratio): distortion through the process  3 songs (of 1 artist) in Cambridge Music Technology ‐ 2 songs : training (Dictionary/GMM for regularization) ‐ 1 song : evaluation (30 - 45 s) & development (GMM parameter) 8 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
  • 10.  Mixing setting # of sources 3 Mixing gains (L:R) (Following figure) 2:1 (Ba) 1:2 (Dr) 1:1 (Vo) Sampling frequency 16 kHz Frame size 32 ms Shift size 16 ms # of basis vectors/source 50 # of iterations (parameter updating) 400 # of mel-filter banks 64 Experimental conditions Left Center Right VoBa Dr 9 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
  • 11. Stronger regularization Results Betterperformance w/o regularization  Comparison w/ or w/o regularization ‐ Better performance w/ regularization → Effective constraint on timbre  Comparison b/w semi-/supervised ‐ Large improvement in semi-supervised w/ regularization → Effective mismatch compensation  Effect of hyperparameter setting ‐ Optimum setting shared for any sources ‐ Need to be optimized manually → Hyperparameter setting to be tuned 10 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4
  • 12. Conclusion  Proposed stereophonic music signal separation method ‐ NTF-based decomposition ‐ Panning operation for observed multi-channel signal ‐ Low-rankness for the source spectrograms ‐ Supervised separation framework ‐ Regularization on timbre of individual sources by CDR  Demonstrated effectiveness in supervised framework ‐ SS w/o reg. < S w/o reg. < S w/ reg. < SS w/ reg. ‐ Better separation performance w/ the regularization  Future works ‐ Hyperparameter setting investigation ‐ Investigation for other various music sources Thank you for the listening! 11 EUSIPCO 2017, Aug. 30, 14:30-16:10, AASP-L4