Transfer learning
Keunwoo.Choi

@qmul.ac.uk
for music classification and regression tasks
The 18th International Society of Music Information Retrieval Conference, Suzhou, China, 2017
abstract
1. Introduction
Transfer learning
Keunwoo.Choi

@qmul.ac.uk
for music classification and regression tasks
1. INTRODUCTION
transferring “knowledge”
#	One	day	
model	=	build_your_convnet()	
xs,	ys	=	get_large_dataset()		#	aka	source	task	
model.train(xs,	ys)		#	get	your	knowledge	
model.save_weights(‘knowledge.h5’)		#	and	save	it
#	Later	
new_xs,	new_ys	=	get_small_dataset()	#	aka	target	task	
model.load_weights(‘knowledge.h5’)	
model.become_feature_extractor(‘in	some	sense’)	
features	=	model.get_feature(new_xs)	
pred_ys	=	classifier.train_predict(features,	new_ys)
Not every task
10
100
1000
10000
100000
1000000
MSD FMA Cal10K Ballroom GTZan emoMusic Jamendo
Not everyone
Transfer learning
Keunwoo.Choi

@qmul.ac.uk
for music classification and regression tasks
2. THE PROPOSED
METHOD
Train a convnet
5-layer
32-channel/layer
Rock
Jazz
Pop
Metal
chillout
beautiful
mellow
male vocal
female vocal
guitar
80s
90s
00s
...
Transferring
aka ‘12345’
aka
aka
aka
aka
aka
aka
‘1234’	
‘2345’	
‘1345’	
‘135’	
‘12’	
‘1’
Knowledge = convnet structure + weights
convnet structure + trained weights convnet structure + random weights
Baselines
convnet structure + random weights
(32 - 160 dims)
m, σ of {MFCC + dMFCC + ddMFCC}
(120 dims)
+‘12345’ + all MFCCs
(160+120=280 dims)
Many other reported methods
(usually larger dims + selection) Rhythm Timbre Chords ...
🌀
🌀
🌀
🌀
Spectral ***🌀
🌀
🌀
Transfer learning
Keunwoo.Choi

@qmul.ac.uk
for music classification and regression tasks
3. EXPERIMENTS
The target tasks
Task Dataset #clips
Ballroom dance
genre classification
Extended ballroom 4,180
Genre classification Gtzan genre 1,000
Speech/music
classification
Gtzan speech/music 128
Emotion prediction EmoMusic 744
Vocal/non-vocal
classification
Jamendo 4,086
Audio event
classification
Urbansound8K 8,732
Configurations
• 10-fold cross-validation

• Support Vector Machines and grid search
Linear basis kernel
Radial basis kernel
Bandwidths of
radial kernel
Penalty parameters
Task 1. Ballroom genre classification
Task 2. Gtzan music genre classification
Arabi Beniya Huang Alexnet
0.780.840.880.91 SVM Classifier + 

• Arabi: spectral + beat + chord features

• Beniya: extensive stats of spectral feats

• Huang: spectral + pitch 

• Alexnet: transferred from visual obj recognition

(final layer only, 4096-dim)
Task 3. Gtzan speech/music classification
Task 4. Music emotion prediction
arousal
valence
RNN SVM
0.541
0.704
RNN SVM
0.320
0.500
[56] 4777-dim features

using which classifier
Task 5. Vocal/non-vocal classification
Task 6. Audio event classification
Yes No
0.714
0.790
In [43],

data aug?
Overall results
How musical is MFCCs?
“To some extent.”
Convnet feature
How musical is MFCCs?
• For musical tasks, adding MFCCs to convnet hurts

• For the non-musical task, it was better!

• ..which might mean...
MFCC’s
musical aspect
MFCC’s
non-musical aspect
“to this extent.”
“maybe no more than convnet feature”
BONUS: genre classification (FMA dataset)
https://github.com/keunwoochoi/FMA_convnet_features
0
20
40
60
80
LR kNN SVCrbf DT AdaBoost
MFCC M/cont/ce non-EN Convnet
140 dims 196 dims 518 dims 160 dims
go	get	it
$	git	clone	https://github.com/keunwoochoi/
transfer_learning_music.git	
$	head	audio_paths.txt	
srcs/music_item_3.mp3	
srcs/music_item_2.mp3	
srcs/music_item_3.mp3	
$	python	extract_feat.py	audio_paths.txt	features_out.npy	
	---	running	feature	extraction...	---	
	---	feature	extraction	is	done!			---	
$	ls	
extract_feat.py														audio_path.txt	
model.h5																					features_out.npy
Transfer learning
Keunwoo.Choi

@qmul.ac.uk
for music classification and regression tasks
4. CONCLUSIONS
Conclusions
• Exhaustive still efficient multi-layer convnet feature

• Different layer, different meaning

• Music tagging as a source task seems versatile

• Codes/weights are out there!
Transfer learning
Keunwoo.Choi

@qmul.ac.uk
for music classification and regression tasks
György Fazekas, Mark Sandler, Kyunghyun Cho
The 18th International Society of Music Information Retrieval Conference, Suzhou, China, 2017
Links
My blog | A blog post on this | Paper! | Codes and weights

@keunwoochoi

Transfer learning for music classification and regression tasks