SlideShare a Scribd company logo
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Automatic Tagging using
Deep Convolutional Neural Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Centre for Digital Music, Queen Mary University of London, UK
1/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
1 Introduction
2 CNNs and Music
TF-representations
Convolution Kernels and Axes
Pooling
3 Problem definition
4 The proposed architecture
5 Experiments and discussions
Overview
MagnaTagATune
Million Song Dataset
6 Conclusion
7 Reference
2/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Introduction
Tagging
Tags
Descriptive keywords that people put on music
Multi-label nature
E.g. {rock, guitar, drive, 90’s}
Music tags include Genres (rock, pop, alternative, indie),
Instruments (vocalists, guitar, violin), Emotions (mellow,
chill), Activities (party, drive), Eras (00’s, 90’s, 80’s).
Collaboratively created (Last.fm ) → noisy
false negative
synonyms (vocal/vocals/vocalist/vocalists/voice/voices.
guitar/guitars)
popularity bias
typo (harpsicord)
irrelevant tags (abcd, ilikeit, fav)
3/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Introduction
Tagging
Somehow multi-task: Genre/instrument/emotion/era can
be in separate tasks
Genres (rock, pop, alternative, indie), Instruments
(vocalists, guitar, violin), Emotions (mellow, chill),
Activities (party, drive), Eras (00’s, 90’s, 80’s).
Although there are many missings
Are they really extractable from audio?
4/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Introduction
Previous approaches
Conventional ML: Feature extraction + Classifier [9]
5/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Introduction
Previous deep approaches
Going deep (and automatic)!
Dieleman and Schrauwen, 2014 IEEE [4]
, and his work at Spotify
Nam et al., 2015 [7]
6/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
TF-
representations
Convolution
Kernels and
Axes
Pooling
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
CNNs and Music
TF-representations
Options
STFT / Mel-spectrogram / CQT / raw-audio
STFT: Okay, but why not melgram?
Melgram: Efficient
CQT: only if you’re interested in fundamentals/pitchs
Raw-audio: end-to-end setup (learn the transformation),
have not outperformed melgram (yet) in speech/music
perhaps the way to go in the future?
we lose frequency axis though
7/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
TF-
representations
Convolution
Kernels and
Axes
Pooling
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
CNNs and Music
Convolution Kernels and Axes
Kernels
Rule of thumb: deeper > bigger, like vggnet [8] and
residual net [6]
Axes
For tagging, time-axis convolution seems essential
Dieleman’s approach do not apply freq-axis convolution
The proposed method use 2-d conv., i.e. both time and
freq axes
Pros: can see local frequency structure
Cons: we do not know it is really used. (They seem to be
used)[2]
More details in the paper [1]
8/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
TF-
representations
Convolution
Kernels and
Axes
Pooling
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
CNNs and Music
Pooling
Source of invariances
Max-pooling ignores where it comes from = ignore small
differences = invariance to small distortions
9/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Problem definition
Automatic tagging
Automatic tagging is a multi-label classification task
K-dim vector: up to 2K cases
Majority of tags is False (no matter it’s correct or not)
Measured by AUC-ROC
Area Under Curve of Receiver Operating Characteristics
1
1
Image from Kaggle
10/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
The proposed architecture
4-layer fully convolutional network, FCN-4
11/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
The proposed architecture
FCN-5 FCN-6 FCN-7
Mel-spectrogram (input: 96×1366×1)
Conv 3×3×128
MP (2, 4) (output: 48×341×128)
Conv 3×3×256
MP (2, 4) (output: 24×85×256)
Conv 3×3×512
MP (2, 4) (output: 12×21×512)
Conv 3×3×1024
MP (3, 5) (output: 4×4×1024)
Conv 3×3×2048
MP (4, 4) (output: 1×1×2048)
·
Conv 1×1×1024 Conv 1×1×1024
· Conv 1×1×1024
Output 50×1 (sigmoid) 12/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
Overview
MTT MSD
# tracks 25k 1M
# songs 5-6k 1M
Length 29.1s 30-60s
Benchmarks 10+ 0
Labels Tags, genres
Tags, genres,
EchoNest features,
bag-of-word lyrics,...
13/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
MagnaTagATune
The Hopes
To validate the proposed algorithm onto state-of-the-art’s
To find the best setup among FCN-3,4,5,6,7
The Reality
To verify the proposed algorithm is comparable to
state-of-the-art’s
To know Melgram vs STFT vs MFCC
14/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
MagnaTagATune
Same depth (l=4), melgram>MFCC>STFT
melgram: 96 mel-frequency bins
STFT: 128 frequency bins
MFCC: 90 (30 MFCC, 30 MFCCd, 30 MFCCdd)
Methods AUC
FCN-3, mel-spectrogram .852
FCN-4, mel-spectrogram .894
FCN-5, mel-spectrogram .890
FCN-4, STFT .846
FCN-4, MFCC .862
Still, ConvNet may outperform frequency aggregation than
mel-frequency with more data. But not here.
ConvNet outperformed MFCC
15/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
MagnaTagATune
Methods AUC
FCN-3, mel-spectrogram .852
FCN-4, mel-spectrogram .894
FCN-5, mel-spectrogram .890
FCN-4, STFT .846
FCN-4, MFCC .862
FCN-4>FCN-3: Depth worked!
FCN-4>FCN-5 by .004
Deeper model might make it equal after ages of training
Deeper models requires more data
Deeper models take more time (deep residual network[6])
4 layers are enough vs. matter of size(data)?
16/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
MagnaTagATune
Methods AUC
The proposed system, FCN-4 .894
2015, Bag of features and RBM [7] .888
2014, 1-D convolutions[4] .882
2014, Transferred learning [10] .88
2012, Multi-scale approach [3] .898
2011, Pooling MFCC [5] .861
All deep and NN approaches are around .88-.89
Are we touching the glass ceiling?
Perhaps due to the noise of MTT, but tricky to prove it
26K tracks are not enough for millions of parameters
17/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
MagnaTagATune
Summary
Melgram over STFT, MFCC
2d convnet is at least not worse than the previous ones
Keunwoo: (Hesistating)MTT, it has been a great journey
with you. I think we should move on.
MTT: (With tears)No! Are you.. are you gonna be with
MSD?
Keunwoo: ...
MTT: (Falls down)
Keunwoo: (Almost taps MTT, then get a message from
MSD that it has the crawled audio files.)
18/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
Million Song Dataset
Methods AUC
FCN-3, mel-spectrogram .786
FCN-4, — .808
FCN-5, — .848
FCN-6, — .851
FCN-7, — .845
FCN-3<4<5<6 !
Deeper layers pay off
utill 6-layers in this case.
19/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Overview
MagnaTagATune
Million Song
Dataset
Conclusion
Reference
Experiments and discussions
Million Song Dataset
Complex models take more time, but at the end they may
outperform
20/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Conclusion
2D fully convolutional networks work well
Mel-spectrogram can be preferred to STFT until
until we have a HUGE dataset so that mel-frequency
aggregation can be replaced
Bye bye, MFCC? In the near future, I guess
MIR can go deeper than now
if we have bigger, better, stronger datasets
Q. How do ConvNets actually deal with spectrograms?
A. Stay tuned to this year’s MLSP paper!
21/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Q&A
22/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Choi, K., Fazekas, G., Sandler, M.: Automatic tagging
using deep convolutional neural networks. In: Proceedings
of the 17th International Society for Music Information
Retrieval Conference (ISMIR 2016), New York, USA (2016)
Choi, K., Fazekas, G., Sandler, M.: Explaining
convolutional neural networks on music classification
(submitted). In: IEEE International Workshop on Machine
Learning for Signal Processing, Salerno, Italy. IEEE (2016)
Dieleman, S., Schrauwen, B.: Multiscale approaches to
music audio feature learning. In: ISMIR. pp. 3–8 (2013)
Dieleman, S., Schrauwen, B.: End-to-end learning for
music audio. In: Acoustics, Speech and Signal Processing
(ICASSP), 2014 IEEE International Conference on. pp.
6964–6968. IEEE (2014)
22/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Hamel, P., Lemieux, S., Bengio, Y., Eck, D.: Temporal
pooling and multiscale learning for automatic annotation
and ranking of music audio. In: ISMIR. pp. 729–734 (2011)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning
for image recognition. arXiv preprint arXiv:1512.03385
(2015)
Nam, J., Herrera, J., Lee, K.: A deep bag-of-features
model for music auto-tagging. arXiv preprint
arXiv:1508.04999 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional
networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556 (2014)
22/22
Automatic
Tagging using
Deep
Convolutional
Neural
Networks [1]
Keunwoo.Choi
@qmul.ac.uk
Introduction
CNNs and
Music
Problem
definition
The proposed
architecture
Experiments
and
discussions
Conclusion
Reference
Tzanetakis, G., Cook, P.: Musical genre classification of
audio signals. Speech and Audio Processing, IEEE
transactions on 10(5), 293–302 (2002)
Van Den Oord, A., Dieleman, S., Schrauwen, B.: Transfer
learning by supervised pre-training for audio-based music
classification. In: Conference of the International Society
for Music Information Retrieval (ISMIR 2014) (2014)
22/22

More Related Content

Recently uploaded

SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
um7474492
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
Unit -II Spectroscopy - EC I B.Tech.pdf
Unit -II Spectroscopy - EC  I B.Tech.pdfUnit -II Spectroscopy - EC  I B.Tech.pdf
Unit -II Spectroscopy - EC I B.Tech.pdf
TeluguBadi
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
PreethaV16
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
DharmaBanothu
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
Seetal Daas
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
pvpriya2
 

Recently uploaded (20)

SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
Unit -II Spectroscopy - EC I B.Tech.pdf
Unit -II Spectroscopy - EC  I B.Tech.pdfUnit -II Spectroscopy - EC  I B.Tech.pdf
Unit -II Spectroscopy - EC I B.Tech.pdf
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Automatic Tagging using Deep Convolutional Neural Networks

  • 1. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Centre for Digital Music, Queen Mary University of London, UK 1/22
  • 2. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference 1 Introduction 2 CNNs and Music TF-representations Convolution Kernels and Axes Pooling 3 Problem definition 4 The proposed architecture 5 Experiments and discussions Overview MagnaTagATune Million Song Dataset 6 Conclusion 7 Reference 2/22
  • 3. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Introduction Tagging Tags Descriptive keywords that people put on music Multi-label nature E.g. {rock, guitar, drive, 90’s} Music tags include Genres (rock, pop, alternative, indie), Instruments (vocalists, guitar, violin), Emotions (mellow, chill), Activities (party, drive), Eras (00’s, 90’s, 80’s). Collaboratively created (Last.fm ) → noisy false negative synonyms (vocal/vocals/vocalist/vocalists/voice/voices. guitar/guitars) popularity bias typo (harpsicord) irrelevant tags (abcd, ilikeit, fav) 3/22
  • 4. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Introduction Tagging Somehow multi-task: Genre/instrument/emotion/era can be in separate tasks Genres (rock, pop, alternative, indie), Instruments (vocalists, guitar, violin), Emotions (mellow, chill), Activities (party, drive), Eras (00’s, 90’s, 80’s). Although there are many missings Are they really extractable from audio? 4/22
  • 5. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Introduction Previous approaches Conventional ML: Feature extraction + Classifier [9] 5/22
  • 6. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Introduction Previous deep approaches Going deep (and automatic)! Dieleman and Schrauwen, 2014 IEEE [4] , and his work at Spotify Nam et al., 2015 [7] 6/22
  • 7. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music TF- representations Convolution Kernels and Axes Pooling Problem definition The proposed architecture Experiments and discussions Conclusion CNNs and Music TF-representations Options STFT / Mel-spectrogram / CQT / raw-audio STFT: Okay, but why not melgram? Melgram: Efficient CQT: only if you’re interested in fundamentals/pitchs Raw-audio: end-to-end setup (learn the transformation), have not outperformed melgram (yet) in speech/music perhaps the way to go in the future? we lose frequency axis though 7/22
  • 8. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music TF- representations Convolution Kernels and Axes Pooling Problem definition The proposed architecture Experiments and discussions Conclusion CNNs and Music Convolution Kernels and Axes Kernels Rule of thumb: deeper > bigger, like vggnet [8] and residual net [6] Axes For tagging, time-axis convolution seems essential Dieleman’s approach do not apply freq-axis convolution The proposed method use 2-d conv., i.e. both time and freq axes Pros: can see local frequency structure Cons: we do not know it is really used. (They seem to be used)[2] More details in the paper [1] 8/22
  • 9. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music TF- representations Convolution Kernels and Axes Pooling Problem definition The proposed architecture Experiments and discussions Conclusion CNNs and Music Pooling Source of invariances Max-pooling ignores where it comes from = ignore small differences = invariance to small distortions 9/22
  • 10. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Problem definition Automatic tagging Automatic tagging is a multi-label classification task K-dim vector: up to 2K cases Majority of tags is False (no matter it’s correct or not) Measured by AUC-ROC Area Under Curve of Receiver Operating Characteristics 1 1 Image from Kaggle 10/22
  • 11. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference The proposed architecture 4-layer fully convolutional network, FCN-4 11/22
  • 12. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference The proposed architecture FCN-5 FCN-6 FCN-7 Mel-spectrogram (input: 96×1366×1) Conv 3×3×128 MP (2, 4) (output: 48×341×128) Conv 3×3×256 MP (2, 4) (output: 24×85×256) Conv 3×3×512 MP (2, 4) (output: 12×21×512) Conv 3×3×1024 MP (3, 5) (output: 4×4×1024) Conv 3×3×2048 MP (4, 4) (output: 1×1×2048) · Conv 1×1×1024 Conv 1×1×1024 · Conv 1×1×1024 Output 50×1 (sigmoid) 12/22
  • 13. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions Overview MTT MSD # tracks 25k 1M # songs 5-6k 1M Length 29.1s 30-60s Benchmarks 10+ 0 Labels Tags, genres Tags, genres, EchoNest features, bag-of-word lyrics,... 13/22
  • 14. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions MagnaTagATune The Hopes To validate the proposed algorithm onto state-of-the-art’s To find the best setup among FCN-3,4,5,6,7 The Reality To verify the proposed algorithm is comparable to state-of-the-art’s To know Melgram vs STFT vs MFCC 14/22
  • 15. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions MagnaTagATune Same depth (l=4), melgram>MFCC>STFT melgram: 96 mel-frequency bins STFT: 128 frequency bins MFCC: 90 (30 MFCC, 30 MFCCd, 30 MFCCdd) Methods AUC FCN-3, mel-spectrogram .852 FCN-4, mel-spectrogram .894 FCN-5, mel-spectrogram .890 FCN-4, STFT .846 FCN-4, MFCC .862 Still, ConvNet may outperform frequency aggregation than mel-frequency with more data. But not here. ConvNet outperformed MFCC 15/22
  • 16. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions MagnaTagATune Methods AUC FCN-3, mel-spectrogram .852 FCN-4, mel-spectrogram .894 FCN-5, mel-spectrogram .890 FCN-4, STFT .846 FCN-4, MFCC .862 FCN-4>FCN-3: Depth worked! FCN-4>FCN-5 by .004 Deeper model might make it equal after ages of training Deeper models requires more data Deeper models take more time (deep residual network[6]) 4 layers are enough vs. matter of size(data)? 16/22
  • 17. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions MagnaTagATune Methods AUC The proposed system, FCN-4 .894 2015, Bag of features and RBM [7] .888 2014, 1-D convolutions[4] .882 2014, Transferred learning [10] .88 2012, Multi-scale approach [3] .898 2011, Pooling MFCC [5] .861 All deep and NN approaches are around .88-.89 Are we touching the glass ceiling? Perhaps due to the noise of MTT, but tricky to prove it 26K tracks are not enough for millions of parameters 17/22
  • 18. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions MagnaTagATune Summary Melgram over STFT, MFCC 2d convnet is at least not worse than the previous ones Keunwoo: (Hesistating)MTT, it has been a great journey with you. I think we should move on. MTT: (With tears)No! Are you.. are you gonna be with MSD? Keunwoo: ... MTT: (Falls down) Keunwoo: (Almost taps MTT, then get a message from MSD that it has the crawled audio files.) 18/22
  • 19. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions Million Song Dataset Methods AUC FCN-3, mel-spectrogram .786 FCN-4, — .808 FCN-5, — .848 FCN-6, — .851 FCN-7, — .845 FCN-3<4<5<6 ! Deeper layers pay off utill 6-layers in this case. 19/22
  • 20. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Overview MagnaTagATune Million Song Dataset Conclusion Reference Experiments and discussions Million Song Dataset Complex models take more time, but at the end they may outperform 20/22
  • 21. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Conclusion 2D fully convolutional networks work well Mel-spectrogram can be preferred to STFT until until we have a HUGE dataset so that mel-frequency aggregation can be replaced Bye bye, MFCC? In the near future, I guess MIR can go deeper than now if we have bigger, better, stronger datasets Q. How do ConvNets actually deal with spectrograms? A. Stay tuned to this year’s MLSP paper! 21/22
  • 22. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Q&A 22/22
  • 23. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Choi, K., Fazekas, G., Sandler, M.: Automatic tagging using deep convolutional neural networks. In: Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York, USA (2016) Choi, K., Fazekas, G., Sandler, M.: Explaining convolutional neural networks on music classification (submitted). In: IEEE International Workshop on Machine Learning for Signal Processing, Salerno, Italy. IEEE (2016) Dieleman, S., Schrauwen, B.: Multiscale approaches to music audio feature learning. In: ISMIR. pp. 3–8 (2013) Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. pp. 6964–6968. IEEE (2014) 22/22
  • 24. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Hamel, P., Lemieux, S., Bengio, Y., Eck, D.: Temporal pooling and multiscale learning for automatic annotation and ranking of music audio. In: ISMIR. pp. 729–734 (2011) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015) Nam, J., Herrera, J., Lee, K.: A deep bag-of-features model for music auto-tagging. arXiv preprint arXiv:1508.04999 (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 22/22
  • 25. Automatic Tagging using Deep Convolutional Neural Networks [1] Keunwoo.Choi @qmul.ac.uk Introduction CNNs and Music Problem definition The proposed architecture Experiments and discussions Conclusion Reference Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. Speech and Audio Processing, IEEE transactions on 10(5), 293–302 (2002) Van Den Oord, A., Dieleman, S., Schrauwen, B.: Transfer learning by supervised pre-training for audio-based music classification. In: Conference of the International Society for Music Information Retrieval (ISMIR 2014) (2014) 22/22