SlideShare a Scribd company logo
The Effects of Noisy Labels
Keunwoo.Choi

@qmul.ac.uk
on deep convolutional neural networks for music tagging
arXiv:1706.02361
abstract
1. Introduction
@KeunwooChoi
2014--present: PhD, Queen Mary University of London

2016--present: Buzzmusiq lnc.

2016/ 06--12: Visiting PhD, NYU

2015/ 06--09: Intern, Naver Labs

2011--2014: Audio research team, ETRI

2009--2011: Applied Acoustic Lab, EECS, SNU

2005--2009: EECS, SNU

Papers on ISMIR/ICASSP/IEEE Trans./Etc.

Python/Keras/Pytorch
The Effects of Noisy Labels
Keunwoo.Choi

@qmul.ac.uk
on deep convolutional neural networks for music tagging
György Fazekas, Kyunghyun Cho, Mark Sandler
arXiv:1706.02361
1. INTRODUCTION
Tagging
• Anyone can tag any words (or non-words) to any song

• The quality is ****.

• Poor, innocent, (financially) poor researchers need to use it
Tagging
(Tag, count)
rock 101071
pop 69159
alternative 55777
indie 48175
electronic 46270
female vocalists 42565
favorites 39921
00s 31432
Awesome 26248
american 22694
seen live 20705
cool 19581
Favorite18864
Favourites 17722
female vocalist 17328
guitar 17302
loved 12483
favorite songs 12392
heard on Pandora 10470
USA 8725
2000s 8671
Favourite Songs 8661
drjazzmrfunkmusic 8364
77davez-all-tracks7278
fav 6155
bass 3364
songs I absolutely love
3293
vocals 2369
drums2281
🤔
Female vocalists
Male vocalist
Guitar
Bass
Vocals
Drums
0% 25% 50% 75% 100%
True False
Questions
How noisy?
Is training
alright?
How about
evaluation?
What are
they
learning?
The Effects of Noisy Labels
Keunwoo.Choi

@qmul.ac.uk
on deep convolutional neural networks for music tagging
György Fazekas, Kyunghyun Cho, Mark Sandler
arXiv:1706.02361
2. HOW NOISY?

IS TRAINING OK?
Measuring the noise
• We need strongly-labelled re-annotations
• Instrumentation labels are (sort of) objective

(instrumental, female vocal, male vocal, guitar)
• 242K songs are still a lot → select a subset (or two)!
I can do it!
..but not
all of them
Strongly labelling: Subset100
• Subset100: random 50 from ‘True’ 

+ random 50 from ‘False’ (for each label)
Instrumental
Female vocalists
Male vocalist
Guitar
True False
50songs 50songs
50 50
50 50
50 50
Strongly labelling: Subset400
• Subset400: Just random 400 items
242K songs × 50 tags
400 songs
4 tagsSubset400
🎵🖊.......................😭
AFTER
BEFORE
Evaluating groundtruth on Subset100
0
25
50
75
100
+ Error rate Precision
Instrumental female voc
male vocal guitar
0
25
50
75
100
- Error rate Recall
Instrumental female voc
male vocal guitar
#Occurrences estimation
0
20
40
60
80
In all, by GT My estimation

using S100
My re-annotation

on S400
Instrumental female voc male vocal guitar
Again, with box plots
{Instrumental, female vocalists}
vs.
{male vocalists, guitar}
Group A vs B, but why?
• Tagging ‘vocals’, ‘drums’, ‘bass’ is like.. 

→ They’re not tag-worthy

→ Let’s call it ‘taggability’
Female vocalists
Male vocalist
Guitar
Bass
Vocals
Drums
0% 25% 50% 75% 100
True False
***?
What’s on
the desk?
The hypothesis
If unusual → high taggability.
Instrumental, female vocal :
high taggability
Male vocal, guitar:
low taggability
The hypothesis
If unusual → high taggability.
If high taggability
→ less false negative = higher recall (of GT)
Instrumental, female vocal :
high taggability,
less false neg, higher recall
Male vocal, guitar:
low taggability,

more false neg, lower recall
The hypothesis
If unusual → high taggability.
If high taggability
→ less false negative = higher recall (of GT)
If higher recall (=more reliable GT),
→ ?
[33] Choi et al. 2017, Convolutional recu...
Hypothesis
If unusual → high taggability.
If high taggability
→ less false negative = higher recall (of GT)
If higher recall (=more reliable GT),
→ ?
Performance(AUC)
!!!
The hypothesis
If unusual → high taggability.
If high taggability
→ less false negative = higher recall (of GT)
Instrumental, female vocal :
high taggability,
less false neg, higher recall,
better classification
Male vocal, guitar:
low taggability,

more false neg, lower recall,
worse classification
If higher recall (=more reliable GT),
→ better classification
The Effects of Noisy Labels
Keunwoo.Choi

@qmul.ac.uk
on deep convolutional neural networks for music tagging
György Fazekas, Kyunghyun Cho, Mark Sandler
arXiv:1706.02361
3. IS EVALUATION OK?
Really?
So, we evaluate the classifier based on..
🤔
I need a noise-free groundtruth...
Evaluate the evaluation
242K songs × 50 tags
400 songs
4 tagsSubset400
HAHAHAH!Subset400!
Results
Evaluate the evaluation
Interesting! With such noise, 

the results are still okay.
It’s not perfect though.
HAHAHA!
The Effects of Noisy Labels
Keunwoo.Choi

@qmul.ac.uk
on deep convolutional neural networks for music tagging
György Fazekas, Kyunghyun Cho, Mark Sandler
arXiv:1706.02361
4. LABEL VECTOR
ANALYSIS
Label vector
(50,	50)
Label vector similarity
• Similarity between labels

according to the trained convnet.
Label vector
Label vector vs co-occurrence (GT)
Label vector vs co-occurrence (GT)
• Mostly, LV reproduces the groundtruth.

• Except: similar pairs only by label vector:

(sad, beautiful), (happy, catchy), (rnb, sexy)
‘Sad songs are beautiful.’
‘Catchy songs are often happy songs.’
‘R&B claims to be sexy.’
🤔 Makes sense..
The Effects of Noisy Labels
Keunwoo.Choi

@qmul.ac.uk
on deep convolutional neural networks for music tagging
György Fazekas, Kyunghyun Cho, Mark Sandler
arXiv:1706.02361
5. CONCLUSIONS
Conclusions
• We quantified how noisy weakly-labelled groundtruth is.

• We conjectured why some labels are noisier.

• We showed what happens to the noisier labels on training
and evaluation.

• We investigated what a convnet learns.
The Effects of Noisy Labels
Keunwoo.Choi

@qmul.ac.uk
on deep convolutional neural networks for music tagging
György Fazekas, Kyunghyun Cho, Mark Sandler
arXiv:1706.02361
Links
My blog | blog post 1, blog post 2 | Paper!

More Related Content

More from Keunwoo Choi

More from Keunwoo Choi (10)

인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
 
가상현실을 위한 오디오 기술
가상현실을 위한 오디오 기술가상현실을 위한 오디오 기술
가상현실을 위한 오디오 기술
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
 
Convolutional recurrent neural networks for music classification
Convolutional recurrent neural networks for music classificationConvolutional recurrent neural networks for music classification
Convolutional recurrent neural networks for music classification
 
dl4mir tutorial at ETRI, Korea
dl4mir tutorial at ETRI, Koreadl4mir tutorial at ETRI, Korea
dl4mir tutorial at ETRI, Korea
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
 

Recently uploaded

Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
Kamal Acharya
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
Atif Razi
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 

The effects of noisy labels on deep convolutional neural networks for music tagging

  • 1. The Effects of Noisy Labels Keunwoo.Choi @qmul.ac.uk on deep convolutional neural networks for music tagging arXiv:1706.02361
  • 3. @KeunwooChoi 2014--present: PhD, Queen Mary University of London 2016--present: Buzzmusiq lnc. 2016/ 06--12: Visiting PhD, NYU 2015/ 06--09: Intern, Naver Labs 2011--2014: Audio research team, ETRI 2009--2011: Applied Acoustic Lab, EECS, SNU 2005--2009: EECS, SNU Papers on ISMIR/ICASSP/IEEE Trans./Etc. Python/Keras/Pytorch
  • 4. The Effects of Noisy Labels Keunwoo.Choi @qmul.ac.uk on deep convolutional neural networks for music tagging György Fazekas, Kyunghyun Cho, Mark Sandler arXiv:1706.02361 1. INTRODUCTION
  • 5. Tagging • Anyone can tag any words (or non-words) to any song • The quality is ****. • Poor, innocent, (financially) poor researchers need to use it
  • 6. Tagging (Tag, count) rock 101071 pop 69159 alternative 55777 indie 48175 electronic 46270 female vocalists 42565 favorites 39921 00s 31432 Awesome 26248 american 22694 seen live 20705 cool 19581 Favorite18864 Favourites 17722 female vocalist 17328 guitar 17302 loved 12483 favorite songs 12392 heard on Pandora 10470 USA 8725 2000s 8671 Favourite Songs 8661 drjazzmrfunkmusic 8364 77davez-all-tracks7278 fav 6155 bass 3364 songs I absolutely love 3293 vocals 2369 drums2281
  • 8. Questions How noisy? Is training alright? How about evaluation? What are they learning?
  • 9. The Effects of Noisy Labels Keunwoo.Choi @qmul.ac.uk on deep convolutional neural networks for music tagging György Fazekas, Kyunghyun Cho, Mark Sandler arXiv:1706.02361 2. HOW NOISY?
 IS TRAINING OK?
  • 10. Measuring the noise • We need strongly-labelled re-annotations • Instrumentation labels are (sort of) objective (instrumental, female vocal, male vocal, guitar) • 242K songs are still a lot → select a subset (or two)! I can do it! ..but not all of them
  • 11. Strongly labelling: Subset100 • Subset100: random 50 from ‘True’ 
 + random 50 from ‘False’ (for each label) Instrumental Female vocalists Male vocalist Guitar True False 50songs 50songs 50 50 50 50 50 50
  • 12. Strongly labelling: Subset400 • Subset400: Just random 400 items 242K songs × 50 tags 400 songs 4 tagsSubset400
  • 14. Evaluating groundtruth on Subset100 0 25 50 75 100 + Error rate Precision Instrumental female voc male vocal guitar 0 25 50 75 100 - Error rate Recall Instrumental female voc male vocal guitar
  • 15. #Occurrences estimation 0 20 40 60 80 In all, by GT My estimation
 using S100 My re-annotation
 on S400 Instrumental female voc male vocal guitar
  • 16. Again, with box plots {Instrumental, female vocalists} vs. {male vocalists, guitar}
  • 17. Group A vs B, but why? • Tagging ‘vocals’, ‘drums’, ‘bass’ is like.. → They’re not tag-worthy → Let’s call it ‘taggability’ Female vocalists Male vocalist Guitar Bass Vocals Drums 0% 25% 50% 75% 100 True False ***? What’s on the desk?
  • 18. The hypothesis If unusual → high taggability. Instrumental, female vocal : high taggability Male vocal, guitar: low taggability
  • 19. The hypothesis If unusual → high taggability. If high taggability → less false negative = higher recall (of GT) Instrumental, female vocal : high taggability, less false neg, higher recall Male vocal, guitar: low taggability,
 more false neg, lower recall
  • 20. The hypothesis If unusual → high taggability. If high taggability → less false negative = higher recall (of GT) If higher recall (=more reliable GT), → ?
  • 21. [33] Choi et al. 2017, Convolutional recu... Hypothesis If unusual → high taggability. If high taggability → less false negative = higher recall (of GT) If higher recall (=more reliable GT), → ? Performance(AUC) !!!
  • 22. The hypothesis If unusual → high taggability. If high taggability → less false negative = higher recall (of GT) Instrumental, female vocal : high taggability, less false neg, higher recall, better classification Male vocal, guitar: low taggability,
 more false neg, lower recall, worse classification If higher recall (=more reliable GT), → better classification
  • 23.
  • 24. The Effects of Noisy Labels Keunwoo.Choi @qmul.ac.uk on deep convolutional neural networks for music tagging György Fazekas, Kyunghyun Cho, Mark Sandler arXiv:1706.02361 3. IS EVALUATION OK?
  • 25. Really? So, we evaluate the classifier based on.. 🤔 I need a noise-free groundtruth...
  • 26. Evaluate the evaluation 242K songs × 50 tags 400 songs 4 tagsSubset400 HAHAHAH!Subset400!
  • 28. Evaluate the evaluation Interesting! With such noise, 
 the results are still okay. It’s not perfect though. HAHAHA!
  • 29. The Effects of Noisy Labels Keunwoo.Choi @qmul.ac.uk on deep convolutional neural networks for music tagging György Fazekas, Kyunghyun Cho, Mark Sandler arXiv:1706.02361 4. LABEL VECTOR ANALYSIS
  • 31. Label vector similarity • Similarity between labels
 according to the trained convnet.
  • 33. Label vector vs co-occurrence (GT)
  • 34. Label vector vs co-occurrence (GT) • Mostly, LV reproduces the groundtruth. • Except: similar pairs only by label vector: (sad, beautiful), (happy, catchy), (rnb, sexy) ‘Sad songs are beautiful.’ ‘Catchy songs are often happy songs.’ ‘R&B claims to be sexy.’ 🤔 Makes sense..
  • 35. The Effects of Noisy Labels Keunwoo.Choi @qmul.ac.uk on deep convolutional neural networks for music tagging György Fazekas, Kyunghyun Cho, Mark Sandler arXiv:1706.02361 5. CONCLUSIONS
  • 36. Conclusions • We quantified how noisy weakly-labelled groundtruth is. • We conjectured why some labels are noisier. • We showed what happens to the noisier labels on training and evaluation. • We investigated what a convnet learns.
  • 37. The Effects of Noisy Labels Keunwoo.Choi @qmul.ac.uk on deep convolutional neural networks for music tagging György Fazekas, Kyunghyun Cho, Mark Sandler arXiv:1706.02361
  • 38. Links My blog | blog post 1, blog post 2 | Paper!