Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presentation mml

78 views

Published on

Data augmentation for deep learning source separation of HipHop songs

Published in: Education
  • Be the first to comment

  • Be the first to like this

Presentation mml

  1. 1. Data augmentation for deep learning source separation of HipHop songs Universitat Pompeu Fabra, Barcelona, Music Technology Group Hector Martel, Marius Miron
  2. 2. Music source separation Deep learning methods 2 DUR GRA3 GRA2 GRA1 HUA IBM JEO1 JEO2 KON1 KAM1 KAM2 MRN NUG1 NUG2 NUG3 NUG4 OZE RAF1 RAF2 RAF3 STO1 STO2 UHL1 UHL2 UHL3 20 15 10 5 0 5 10 15 20 score metric = SDR | target_name = vocals
  3. 3. HipHop 3 The voice is not sung Variability of timbres Variability of production styles Youtube
  4. 4. HipHop 4 The voice is not sung Variability of timbres Variability of production styles Small datasets in the same production style
  5. 5. Source separation framework 5 Data processing Audio pieces Magnitude spectra STFT CNN training Test piece STFT Spectra SeparationISTFT Separated spectra Separated sources Data processing Batches Batches Model Training Separation
  6. 6. (J,T,F) = with (J,T,F) (J,T,F) (T,F) (T,F) Neural network Source separation architecture 6
  7. 7. Source separation architecture conv1 f(1,30) s(1,4) conv2 f(20,1) s(1,1) dense1 256 inverse conv2 inverse conv1 (1,T,F) (J,T,F) (30,T,F)1 Jx(30,T,F)11Jx(30,T,F)1 (30,T,F)1 1 dense2 Jx30xTxF1 1 = with (J,T,F) (J,T,F) (T,F) 7 Chandna et al,(2017). Monoaural audio source separation using deep convolutional neural networks. LVA/ICA,258-266.
  8. 8. Datasets Multi-track datasets with isolated instruments DSD100 dataset 8 HipHop dataset 18 songs in the same production style (13/5)100 songs with different production styles (50/50) Few, very similar songs
  9. 9. How do we avoid overfitting - Regularization (l2 norm on weights) - Dropouts - Training data augmentation
  10. 10. Instrument augmentation
  11. 11. Mix augmentation
  12. 12. Circular shift augmentation
  13. 13. Experiments DSD100 HHDS DSD - trained with DSD DSD HH - trained with DSD and then HH HH - trained with HHDS DSD HH 2 - trained with DSD and then HH HH COMBI - trained with HHDS All augmentations HH IA- trained with HHDS Instrument augmentation HH MA- trained with HHDS Mix augmentation HH CS- trained with HHDS Circular shift augmentation
  14. 14. Evaluation metrics SDR - Signal to Distortion Ratio SIR - Signal to Interference Ratio SAR - Signal to Artefacts Ratio ISR - Image to Spatial distortion Ratio
  15. 15. Results
  16. 16. Results
  17. 17. Results
  18. 18. Results
  19. 19. Results
  20. 20. Use case
  21. 21. Demo Youtube
  22. 22. Questions? Code: https://github.com/MTG/DeepConvSep Data: on zenodo 22 Demo: https://hiphopss.github.io/ DOIDOI 10.5281/zenodo.82303710.5281/zenodo.823037

×