SlideShare a Scribd company logo
Deep learning for
music recommendation
Aloïs Gruson
@nilandmusic@aloisgr niland.io
Who we are
• Founded in 2013 by 2 PhDs who worked at IRCAM
• Won Mirex 2011 in Music Similarity Estimation and Music
Classification
• We sell our technology through our API
• A team of 9 today
What we want to do
•Create a high-dimensional space where every
song is a vector
•Use this space to find similars and classify
songs
•Each query must be <50ms in millions of tracks
How music information retrieval worked in 2011
• Short-term descriptors: MFCCs,
Fluctuation Patterns ("Block-level
audio features for music genres
classification",Seyerlehner and
al.) and much more !
• Pooling techniques : VQ, GMM-SV
("GMM Supervector for content
based music similarity",
Charbuillet and al.), Vlad
("Aggregation local descriptors
into a compact image
representation", Jégou and al.) ...
Audio
MFCCs
Vlad
FP
GMM-
SV
One of our evaluation datasets
• Evaluation metrics for search engine : Precision at K or
mean Average Precision
• Evaluation set presented here : 8500 tracks in 141
playlists from mainstream music
P@k 1 5 10 20 50
mirex2011 17,48 15,39 13,87 12,23 10,00
From 2013 to 2014 @niland
• How to make a product from research work !
• And a lot of work on short-term descriptors and pooling techniques
• But still completely unsupervised, no real way to match outputs with
human perception !
P@k 1 5 10 20 50
mirex2011 17,48 15,39 13,87 12,23 10,00
2014 19,70 16,81 15,37 13,57 11,01
% +12.70 +9.23 +10.81 +10.96 +10.10
Matching algorithm outputs with human perception
•Learn the outputs of a collaborative filtering
model
"Deep content-based music recommendation", Oord and
al.
•Or use a network trained to classify into groups
of similar tracks
Integrating human idea of similarity
•150k tracks in 3500 theme-based albums from
of our clients
•Each album represents a genre, mood or an
usage
•Each gathers socially similar tracks
• We use outputs from our previous system
• We train it with a classification cost
• And remove the classification layer !
P@k 1 5 10 20 50
2014 19,70 16,81 15,37 13,57 11,01
+deep 23,40 21,09 19,68 18,07 15,19
% +18.78 +25.46 +28.04 +33.16 +37.97
Learning with theme-based albums
What if we want to remove the highly engineered features and
pooling techniques ?
Convolutional Neural Networks for Image Recognition :
Source : http://www.clarifai.com/technology
And for music ?
• Mel-Spectrogram (time-frequency representation) as an
input : axis have different meanings !
Should we really use square filters ?
• Labels on the whole track (>= 30 seconds) : input is
128x1200 for a 30 second song !
We have to pool along time axis !
And for music ?
Source : Sander Dieleman, http://benanne.github.io/2014/08/05/spotify-cnns.html
And for music ?
Some ideas to slightly improve it :
• Multi-scale pooling
• Reduce max pooling
• Add batch-norm
P@k 1 5 10 20 50
2014+deep 23,40 21,09 19,68 18,07 15,19
CNN 23,85 21,31 19,81 18,06 15,18
Okay, so ?
• Our 2014 system is a mix of 6 different short-term
descriptors + 6 different "smart" pooling functions, 10
years of research !
• Has the engineering problem become a data problem ?
P@k 1 5 10 20 50
2014+deep 23,40 21,09 19,68 18,07 15,19
CNN 23,85 21,31 19,81 18,06 15,18
From Fisher Vectors to simple pooling functions?
• A very simple pooling function can give great results !
P@k 1 5 10 20 50
Mean 20,94 19,04 17,69 16,17 13,74
Max 22,21 19,90 18,58 17,07 14,61
Var 21,66 19,46 18,14 16,58 14,13
Mean+Max+Var 23,85 21,31 19,81 18,06 15,18
And with square filters?
•Square filters also seem to work !
P@k 1 5 10 20 50
CNN 23,85 21,31 19,81 18,06 15,18
CNNsq 22,94 20,84 19,79 18,15 15,52
A transferable model for music
• Works also for world music, library music…
• This dataset : 10k tracks from library music, 300 groups
P@k 1 5 10 20 50
2014+deep 30,66 19,99 15,57 11,81 7,93
CNN 29,76 19,82 15,55 11,85 7,80
The spectrogram is still an engineered feature…
Could we learn a better temporal filter bank to
replace FFT and mel-filtering ?
“End-to-end learning for music audio", Dieleman and al.
"Learning the Speech Front-end with raw waveform CLDNNs",
Sainath and al.
Source: "Learning the Speech Front-end with raw waveform CLDNNs", Sainath and al.
P@k 1 5 10 20 50
Raw 20,11 18,95 17,23 15,91 14,26
Spectro 23,85 21,31 19,81 18,06 15,18
The spectrogram is still an engineered feature…
Maybe we need more data ?
We can improve !
• Add more albums !
• With 500k tracks ? 1M ?
P@k 1 5 10 20 50
25k tracks 19,84 17,98 15,21 14,06 13,41
150k tracks 23,85 21,31 19,81 18,06 15,18
And …
• Add more layers !
"Deep Residual Learning for Image Recognition", He and al.
P@k 1 5 10 20 50
PlainNet9 23,85 21,31 19,81 18,06 15,18
ResNet78 23,87 22,17 20,98 19,38 16,68
And ?
• Data augmentation ?
"Exploring data augmentation for improved singing voice detection with neural networks",
Schlüter and Grill
• Recurrent Neural Networks ?
• Siamese Network ?
"An exploration of deep learning in music informatics", Humphrey and al.
• More data ! Or semi supervised approach ?
"Semi-supervised learning with ladder networks", Rasmus and al.
Questions ?
@aloisgr @nilandmusicniland.io
Try it for yourself : http://demo.niland.io

More Related Content

What's hot

"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi
Keunwoo Choi
 
20211026 taicca 2 music generation
20211026 taicca 2 music generation20211026 taicca 2 music generation
20211026 taicca 2 music generation
Yi-Hsuan Yang
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x music
Yi-Hsuan Yang
 
20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI
Yi-Hsuan Yang
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
Esh Vckay
 
Automatic Music Transcription
Automatic Music TranscriptionAutomatic Music Transcription
Automatic Music Transcription
Khyati Ganatra
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
Vidhya Murali
 
More Like This: Machine Learning Approaches to Music similarity
More Like This: Machine Learning Approaches to Music similarityMore Like This: Machine Learning Approaches to Music similarity
More Like This: Machine Learning Approaches to Music similarity
Brian McFee
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
Chris Johnson
 
Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101
Esh Vckay
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Daichi Kitamura
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
Rohan Agrawal
 
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
Dakiry
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
Keunwoo Choi
 
The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...
Keunwoo Choi
 

What's hot (15)

"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi
 
20211026 taicca 2 music generation
20211026 taicca 2 music generation20211026 taicca 2 music generation
20211026 taicca 2 music generation
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x music
 
20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Automatic Music Transcription
Automatic Music TranscriptionAutomatic Music Transcription
Automatic Music Transcription
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
More Like This: Machine Learning Approaches to Music similarity
More Like This: Machine Learning Approaches to Music similarityMore Like This: Machine Learning Approaches to Music similarity
More Like This: Machine Learning Approaches to Music similarity
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
 
The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...
 

Viewers also liked

Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
Keunwoo Choi
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Keunwoo Choi
 
Talwar_Rakshak_2016URD
Talwar_Rakshak_2016URDTalwar_Rakshak_2016URD
Talwar_Rakshak_2016URDRakshak Talwar
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
WithTheBest
 
Pycon apac 2014
Pycon apac 2014Pycon apac 2014
Pycon apac 2014
Renyuan Lyu
 
Audio chord recognition using deep neural networks
Audio chord recognition using deep neural networksAudio chord recognition using deep neural networks
Audio chord recognition using deep neural networks
bzamecnik
 
딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)
Keunwoo Choi
 
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
Sebastian Raschka
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
Keunwoo Choi
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Data Con LA
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Jia-Bin Huang
 
GTC 2016 ディープラーニング最新情報
GTC 2016 ディープラーニング最新情報GTC 2016 ディープラーニング最新情報
GTC 2016 ディープラーニング最新情報
NVIDIA Japan
 

Viewers also liked (12)

Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Talwar_Rakshak_2016URD
Talwar_Rakshak_2016URDTalwar_Rakshak_2016URD
Talwar_Rakshak_2016URD
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Pycon apac 2014
Pycon apac 2014Pycon apac 2014
Pycon apac 2014
 
Audio chord recognition using deep neural networks
Audio chord recognition using deep neural networksAudio chord recognition using deep neural networks
Audio chord recognition using deep neural networks
 
딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)
 
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
GTC 2016 ディープラーニング最新情報
GTC 2016 ディープラーニング最新情報GTC 2016 ディープラーニング最新情報
GTC 2016 ディープラーニング最新情報
 

Similar to Deep Learning Meetup #5

[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
NAVER D2
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Emanuele Chioso
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
Nithin Xavier
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
Yi-Hsuan Yang
 
AC overview
AC overviewAC overview
AC overview
WarNik Chow
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Keunwoo Choi
 
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Timo van Niedek
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
Ramzi Karam
 
Music Classification at SoundCloud
Music Classification at SoundCloudMusic Classification at SoundCloud
Music Classification at SoundCloud
Petko Nikolov
 
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music RecognitionScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
chaser55
 
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
GeeksLab Odessa
 
Mit21 m 380s12_complecnot
Mit21 m 380s12_complecnotMit21 m 380s12_complecnot
Mit21 m 380s12_complecnot
VenkateshKumar708402
 
Music Objects to Social Machines
Music Objects to Social MachinesMusic Objects to Social Machines
Music Objects to Social Machines
David De Roure
 
Timbral modeling for music artist recognition using i-vectors
Timbral modeling for music artist recognition using i-vectorsTimbral modeling for music artist recognition using i-vectors
Timbral modeling for music artist recognition using i-vectors
Hamid Eghbal-zadeh
 
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov ModelAudio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Yao Yao
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
Andy Sloane
 
Fun with MATLAB
Fun with MATLABFun with MATLAB
Fun with MATLAB
ritece
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Ju-Chiang Wang
 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
Kuan Ting Chen
 
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
Ju-Chiang Wang
 

Similar to Deep Learning Meetup #5 (20)

[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 
AC overview
AC overviewAC overview
AC overview
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
 
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
 
Music Classification at SoundCloud
Music Classification at SoundCloudMusic Classification at SoundCloud
Music Classification at SoundCloud
 
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music RecognitionScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
 
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
 
Mit21 m 380s12_complecnot
Mit21 m 380s12_complecnotMit21 m 380s12_complecnot
Mit21 m 380s12_complecnot
 
Music Objects to Social Machines
Music Objects to Social MachinesMusic Objects to Social Machines
Music Objects to Social Machines
 
Timbral modeling for music artist recognition using i-vectors
Timbral modeling for music artist recognition using i-vectorsTimbral modeling for music artist recognition using i-vectors
Timbral modeling for music artist recognition using i-vectors
 
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov ModelAudio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Fun with MATLAB
Fun with MATLABFun with MATLAB
Fun with MATLAB
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
 
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Deep Learning Meetup #5

  • 1. Deep learning for music recommendation Aloïs Gruson @nilandmusic@aloisgr niland.io
  • 2. Who we are • Founded in 2013 by 2 PhDs who worked at IRCAM • Won Mirex 2011 in Music Similarity Estimation and Music Classification • We sell our technology through our API • A team of 9 today
  • 3. What we want to do •Create a high-dimensional space where every song is a vector •Use this space to find similars and classify songs •Each query must be <50ms in millions of tracks
  • 4. How music information retrieval worked in 2011 • Short-term descriptors: MFCCs, Fluctuation Patterns ("Block-level audio features for music genres classification",Seyerlehner and al.) and much more ! • Pooling techniques : VQ, GMM-SV ("GMM Supervector for content based music similarity", Charbuillet and al.), Vlad ("Aggregation local descriptors into a compact image representation", Jégou and al.) ... Audio MFCCs Vlad FP GMM- SV
  • 5. One of our evaluation datasets • Evaluation metrics for search engine : Precision at K or mean Average Precision • Evaluation set presented here : 8500 tracks in 141 playlists from mainstream music P@k 1 5 10 20 50 mirex2011 17,48 15,39 13,87 12,23 10,00
  • 6. From 2013 to 2014 @niland • How to make a product from research work ! • And a lot of work on short-term descriptors and pooling techniques • But still completely unsupervised, no real way to match outputs with human perception ! P@k 1 5 10 20 50 mirex2011 17,48 15,39 13,87 12,23 10,00 2014 19,70 16,81 15,37 13,57 11,01 % +12.70 +9.23 +10.81 +10.96 +10.10
  • 7. Matching algorithm outputs with human perception •Learn the outputs of a collaborative filtering model "Deep content-based music recommendation", Oord and al. •Or use a network trained to classify into groups of similar tracks
  • 8. Integrating human idea of similarity •150k tracks in 3500 theme-based albums from of our clients •Each album represents a genre, mood or an usage •Each gathers socially similar tracks
  • 9. • We use outputs from our previous system • We train it with a classification cost • And remove the classification layer ! P@k 1 5 10 20 50 2014 19,70 16,81 15,37 13,57 11,01 +deep 23,40 21,09 19,68 18,07 15,19 % +18.78 +25.46 +28.04 +33.16 +37.97 Learning with theme-based albums
  • 10. What if we want to remove the highly engineered features and pooling techniques ? Convolutional Neural Networks for Image Recognition : Source : http://www.clarifai.com/technology
  • 11. And for music ? • Mel-Spectrogram (time-frequency representation) as an input : axis have different meanings ! Should we really use square filters ? • Labels on the whole track (>= 30 seconds) : input is 128x1200 for a 30 second song ! We have to pool along time axis !
  • 12. And for music ? Source : Sander Dieleman, http://benanne.github.io/2014/08/05/spotify-cnns.html
  • 13. And for music ? Some ideas to slightly improve it : • Multi-scale pooling • Reduce max pooling • Add batch-norm P@k 1 5 10 20 50 2014+deep 23,40 21,09 19,68 18,07 15,19 CNN 23,85 21,31 19,81 18,06 15,18
  • 14. Okay, so ? • Our 2014 system is a mix of 6 different short-term descriptors + 6 different "smart" pooling functions, 10 years of research ! • Has the engineering problem become a data problem ? P@k 1 5 10 20 50 2014+deep 23,40 21,09 19,68 18,07 15,19 CNN 23,85 21,31 19,81 18,06 15,18
  • 15. From Fisher Vectors to simple pooling functions? • A very simple pooling function can give great results ! P@k 1 5 10 20 50 Mean 20,94 19,04 17,69 16,17 13,74 Max 22,21 19,90 18,58 17,07 14,61 Var 21,66 19,46 18,14 16,58 14,13 Mean+Max+Var 23,85 21,31 19,81 18,06 15,18
  • 16. And with square filters? •Square filters also seem to work ! P@k 1 5 10 20 50 CNN 23,85 21,31 19,81 18,06 15,18 CNNsq 22,94 20,84 19,79 18,15 15,52
  • 17. A transferable model for music • Works also for world music, library music… • This dataset : 10k tracks from library music, 300 groups P@k 1 5 10 20 50 2014+deep 30,66 19,99 15,57 11,81 7,93 CNN 29,76 19,82 15,55 11,85 7,80
  • 18. The spectrogram is still an engineered feature… Could we learn a better temporal filter bank to replace FFT and mel-filtering ? “End-to-end learning for music audio", Dieleman and al. "Learning the Speech Front-end with raw waveform CLDNNs", Sainath and al.
  • 19. Source: "Learning the Speech Front-end with raw waveform CLDNNs", Sainath and al.
  • 20. P@k 1 5 10 20 50 Raw 20,11 18,95 17,23 15,91 14,26 Spectro 23,85 21,31 19,81 18,06 15,18 The spectrogram is still an engineered feature… Maybe we need more data ?
  • 21. We can improve ! • Add more albums ! • With 500k tracks ? 1M ? P@k 1 5 10 20 50 25k tracks 19,84 17,98 15,21 14,06 13,41 150k tracks 23,85 21,31 19,81 18,06 15,18
  • 22. And … • Add more layers ! "Deep Residual Learning for Image Recognition", He and al. P@k 1 5 10 20 50 PlainNet9 23,85 21,31 19,81 18,06 15,18 ResNet78 23,87 22,17 20,98 19,38 16,68
  • 23. And ? • Data augmentation ? "Exploring data augmentation for improved singing voice detection with neural networks", Schlüter and Grill • Recurrent Neural Networks ? • Siamese Network ? "An exploration of deep learning in music informatics", Humphrey and al. • More data ! Or semi supervised approach ? "Semi-supervised learning with ladder networks", Rasmus and al.
  • 24. Questions ? @aloisgr @nilandmusicniland.io Try it for yourself : http://demo.niland.io