Igor Kostiuk | 2016
Tags: #music, #recommender_systems, #deep_learning, #neural_networks, #mel_spectrograms
How to train your music recommender system
Recommender systems are a family of methods that seek to
predict the rating or preference that a user would give to an item
© Wiki
Is there something similar to something else?
There are two common ways to make recommendations.
Collaborative filtering
- cold start problem (requires a large amount of information on a
user in order to make accurate recommendations)
- will not recommend rare or new songs, games, etc. (popular
items will be much easier to recommend than unpopular items)
- bad scalability
+ content-agnostic
Example: Last.fm recommends music based on a comparison of
the listening habits of similar users.
http://ru.anime-characters-fight.wikia.com/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Anime-heroes-wallpaper-hd-cool-7.jpg
Popularity
Content-based filtering
- can only make recommendations that are similar to the original
seed
- semantic gap between audio or video, and the various aspects
of music / movie that affect user preferences (genre, mood)
- obvious recommendations ( Doom Doom 4 etc. )→
http://static.giantbomb.com/uploads/original/13/137381/2846580-doom.jpg
There is nothing more similar to the tea kettle than the other tea kettle
Approaches
1. Automatic generation of social tags
Social tags are user-generated keywords associated with song.
Predicting these social tags directly from MP3 files avoids the ''cold-
start problem''.
Using a set of one vs all classifiers for every tag, we can map audio
features onto social tags collected from the Web.
2. Music genre classification
Attempt to classify songs into a set of genre classes. Clustering – each
cluster represents a specific genre.
Setting label to each cluster by choosing the “majority vote” - which
genre was the most common in that cluster.
https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
Deep Learning approach
Predicting listening preferences from audio signals by training a
regression model to predict the latent representations of songs
that were obtained from a collaborative filtering model.
Data
from a collaborative filtering model
Data
raw mp3
Latent factors vector extracting
matrix factorization
Mel-spectrograms extracting
Deep neural network
input output
prediction
Advantages
+ Effectiveness in recommending new and unpopular songs
+ Good recommendations despite the semantic gap
Development stages
Data retrieval
The Echo Nest Taste Profile Subset
http://labrosa.ee.columbia.edu/millionsong/tasteprofile
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBSUJE12A6D4F8CF5 2
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBVFZR12A6D4F8AE3 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXALG12A8C13C108 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5
Taste Profile subset is big. Some numbers:
1,019,318 unique users
384,546 unique MSD songs
48,373,586 user - song - play count triplets
Data retrieval
https://www.7digital.com/
We are able to attain 29 second audio clips for over 99% of the
dataset.
Original dataset has no raw audio, only precomputed, badly
documented features.
Weighted matrix factorization
https://youtu.be/o8PiWO8C3zs
song_id
user_id
song_id
user_id
Weighted matrix factorization
n songs
m users ≈ *
musers
f
f
n songs
R P
Q
R – rating matrix m*n
P – user matrix m*f
Q – song matrix f*n
f – number of features
Weighted matrix factorization
Alternating Least Squares
http://mendeley.github.io/mrec/
https://github.com/benanne/wmf
https://github.com/benanne/theano_wmf
Weighted matrix factorization
iteration
error
http://benanne.github.io/2014/08/05/spotify-cnns.html
Mel-spectrograms
A mel-spectrograms is a kind of time-frequency representation.
It is obtained from an audio signal by computing the Fourier
transforms of short, overlapping windows.
Finally, the frequency axis is changed from a linear scale to a mel
scale.
https://en.wikipedia.org/wiki/Mel_scale
Mel-spectrograms
series = np.sin(time)
# filename = "The Prodigy - Invaders Must Die.mp3"
# filename = "Lady GaGa - Poker Face.mp3"
Mel-spectrograms
Used log-compressed mel-spectrograms with 128 components
and the window size and hop size 1024 and 512 audio frames
respectively.
https://github.com/librosa/librosa
http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.h
tml#librosa.feature.melspectrogram
T-SNE
https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding
1024 1024 * 2
1024 * 4
Mel-spectrograms
Convolutional neural network
The deep neural network baseline architecture could be
consisted of two convolutional layers and two fully connected
layers.
http://benanne.github.io/2014/08/05/spotify-cnns.html
Convolutional neural network
http://benanne.github.io/2014/08/05/spotify-cnns.html
259 x 128 x 1
4 x 128 x 1
259 x 4 x 32
4s
0.0029s
Filters
Convolutional neural network
The network can be trained on windows of 3 seconds sampled
randomly from the audio clips.
The last layer of the network is the output layer, which predicts
40 latent factors obtained from the collaborative filtering.
http://www.slideshare.net/erikbern/music-recommendations-mlconf-2014
Album cover based models
1) series = (np.sin(time) - np.sin(time / np.pi))
https://www.google.com.ua/#q=y+%3D+sin%28x%29+-+sin%28x+%2F+pi%29
2) Deep content-based music recommendation
http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf
3) Collaborative Filtering for Implicit Feedback Datasets
http://yifanhu.net/PUB/cf.pdf
4) Alternating Least Squares Method for Collaborative Filtering
http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-col
laborative-filtering/
5) Recommending music on Spotify with deep learning
http://benanne.github.io/2014/08/05/spotify-cnns.html
6) *
http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendati
on.pdf
http://cs229.stanford.edu/proj2013/FauciCastSchulze-MusicGenreClassification.pdf
http://ismir2011.ismir.net/papers/PS6-10.pdf
http://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms/
http://www.slideshare.net/irecsys/matrix-factorization-in-recommender-systems
Let’s stay in touch:
Facebook
https://www.facebook.com/neverdraw
LinkedIn
https://www.linkedin.com/in/awesomengineer
Github
https://github.com/spaceuniverse
Thanks
http://cdn.gymnasticstracks.com/wp-content/uploads/2015/09/httyd.jpg

Igor Kostiuk “Как приручить музыкальную рекомендательную систему”

  • 1.
    Igor Kostiuk |2016 Tags: #music, #recommender_systems, #deep_learning, #neural_networks, #mel_spectrograms How to train your music recommender system
  • 2.
    Recommender systems area family of methods that seek to predict the rating or preference that a user would give to an item © Wiki Is there something similar to something else? There are two common ways to make recommendations.
  • 3.
    Collaborative filtering - coldstart problem (requires a large amount of information on a user in order to make accurate recommendations) - will not recommend rare or new songs, games, etc. (popular items will be much easier to recommend than unpopular items) - bad scalability + content-agnostic Example: Last.fm recommends music based on a comparison of the listening habits of similar users.
  • 4.
  • 5.
    Content-based filtering - canonly make recommendations that are similar to the original seed - semantic gap between audio or video, and the various aspects of music / movie that affect user preferences (genre, mood) - obvious recommendations ( Doom Doom 4 etc. )→ http://static.giantbomb.com/uploads/original/13/137381/2846580-doom.jpg
  • 6.
    There is nothingmore similar to the tea kettle than the other tea kettle
  • 7.
    Approaches 1. Automatic generationof social tags Social tags are user-generated keywords associated with song. Predicting these social tags directly from MP3 files avoids the ''cold- start problem''. Using a set of one vs all classifiers for every tag, we can map audio features onto social tags collected from the Web. 2. Music genre classification Attempt to classify songs into a set of genre classes. Clustering – each cluster represents a specific genre. Setting label to each cluster by choosing the “majority vote” - which genre was the most common in that cluster. https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
  • 8.
    Deep Learning approach Predictinglistening preferences from audio signals by training a regression model to predict the latent representations of songs that were obtained from a collaborative filtering model. Data from a collaborative filtering model Data raw mp3 Latent factors vector extracting matrix factorization Mel-spectrograms extracting Deep neural network input output prediction
  • 9.
    Advantages + Effectiveness inrecommending new and unpopular songs + Good recommendations despite the semantic gap
  • 10.
    Development stages Data retrieval TheEcho Nest Taste Profile Subset http://labrosa.ee.columbia.edu/millionsong/tasteprofile b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBSUJE12A6D4F8CF5 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBVFZR12A6D4F8AE3 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXALG12A8C13C108 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5 Taste Profile subset is big. Some numbers: 1,019,318 unique users 384,546 unique MSD songs 48,373,586 user - song - play count triplets
  • 11.
    Data retrieval https://www.7digital.com/ We areable to attain 29 second audio clips for over 99% of the dataset. Original dataset has no raw audio, only precomputed, badly documented features.
  • 12.
  • 13.
    Weighted matrix factorization nsongs m users ≈ * musers f f n songs R P Q R – rating matrix m*n P – user matrix m*f Q – song matrix f*n f – number of features
  • 15.
    Weighted matrix factorization AlternatingLeast Squares http://mendeley.github.io/mrec/ https://github.com/benanne/wmf https://github.com/benanne/theano_wmf
  • 16.
  • 17.
  • 18.
    Mel-spectrograms A mel-spectrograms isa kind of time-frequency representation. It is obtained from an audio signal by computing the Fourier transforms of short, overlapping windows. Finally, the frequency axis is changed from a linear scale to a mel scale. https://en.wikipedia.org/wiki/Mel_scale
  • 19.
    Mel-spectrograms series = np.sin(time) #filename = "The Prodigy - Invaders Must Die.mp3" # filename = "Lady GaGa - Poker Face.mp3"
  • 22.
    Mel-spectrograms Used log-compressed mel-spectrogramswith 128 components and the window size and hop size 1024 and 512 audio frames respectively. https://github.com/librosa/librosa http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.h tml#librosa.feature.melspectrogram
  • 23.
  • 24.
    1024 1024 *2 1024 * 4 Mel-spectrograms
  • 25.
    Convolutional neural network Thedeep neural network baseline architecture could be consisted of two convolutional layers and two fully connected layers. http://benanne.github.io/2014/08/05/spotify-cnns.html
  • 26.
  • 27.
    259 x 128x 1 4 x 128 x 1 259 x 4 x 32 4s 0.0029s
  • 31.
  • 32.
    Convolutional neural network Thenetwork can be trained on windows of 3 seconds sampled randomly from the audio clips. The last layer of the network is the output layer, which predicts 40 latent factors obtained from the collaborative filtering.
  • 33.
  • 34.
    1) series =(np.sin(time) - np.sin(time / np.pi)) https://www.google.com.ua/#q=y+%3D+sin%28x%29+-+sin%28x+%2F+pi%29 2) Deep content-based music recommendation http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf 3) Collaborative Filtering for Implicit Feedback Datasets http://yifanhu.net/PUB/cf.pdf 4) Alternating Least Squares Method for Collaborative Filtering http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-col laborative-filtering/ 5) Recommending music on Spotify with deep learning http://benanne.github.io/2014/08/05/spotify-cnns.html 6) * http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendati on.pdf http://cs229.stanford.edu/proj2013/FauciCastSchulze-MusicGenreClassification.pdf http://ismir2011.ismir.net/papers/PS6-10.pdf http://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms/ http://www.slideshare.net/irecsys/matrix-factorization-in-recommender-systems
  • 35.
    Let’s stay intouch: Facebook https://www.facebook.com/neverdraw LinkedIn https://www.linkedin.com/in/awesomengineer Github https://github.com/spaceuniverse
  • 36.