Igor Kostiuk “Как приручить музыкальную рекомендательную систему”

Igor Kostiuk | 2016
Tags: #music, #recommender_systems, #deep_learning, #neural_networks, #mel_spectrograms
How to train your music recommender system

Recommender systems are a family of methods that seek to
predict the rating or preference that a user would give to an item
© Wiki
Is there something similar to something else?
There are two common ways to make recommendations.

Collaborative filtering
- cold start problem (requires a large amount of information on a
user in order to make accurate recommendations)
- will not recommend rare or new songs, games, etc. (popular
items will be much easier to recommend than unpopular items)
- bad scalability
+ content-agnostic
Example: Last.fm recommends music based on a comparison of
the listening habits of similar users.

http://ru.anime-characters-fight.wikia.com/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Anime-heroes-wallpaper-hd-cool-7.jpg
Popularity

Content-based filtering
- can only make recommendations that are similar to the original
seed
- semantic gap between audio or video, and the various aspects
of music / movie that affect user preferences (genre, mood)
- obvious recommendations ( Doom Doom 4 etc. )→
http://static.giantbomb.com/uploads/original/13/137381/2846580-doom.jpg

There is nothing more similar to the tea kettle than the other tea kettle

Approaches
1. Automatic generation of social tags
Social tags are user-generated keywords associated with song.
Predicting these social tags directly from MP3 files avoids the ''cold-
start problem''.
Using a set of one vs all classifiers for every tag, we can map audio
features onto social tags collected from the Web.
2. Music genre classification
Attempt to classify songs into a set of genre classes. Clustering – each
cluster represents a specific genre.
Setting label to each cluster by choosing the “majority vote” - which
genre was the most common in that cluster.
https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

Deep Learning approach
Predicting listening preferences from audio signals by training a
regression model to predict the latent representations of songs
that were obtained from a collaborative filtering model.
Data
from a collaborative filtering model
Data
raw mp3
Latent factors vector extracting
matrix factorization
Mel-spectrograms extracting
Deep neural network
input output
prediction

Advantages
+ Effectiveness in recommending new and unpopular songs
+ Good recommendations despite the semantic gap

Development stages
Data retrieval
The Echo Nest Taste Profile Subset
http://labrosa.ee.columbia.edu/millionsong/tasteprofile
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBSUJE12A6D4F8CF5 2
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBVFZR12A6D4F8AE3 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXALG12A8C13C108 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5
Taste Profile subset is big. Some numbers:
1,019,318 unique users
384,546 unique MSD songs
48,373,586 user - song - play count triplets

Data retrieval
https://www.7digital.com/
We are able to attain 29 second audio clips for over 99% of the
dataset.
Original dataset has no raw audio, only precomputed, badly
documented features.

Weighted matrix factorization
https://youtu.be/o8PiWO8C3zs
song_id
user_id
song_id
user_id

n songs
m users ≈ *
musers
f
f
n songs
R P
Q
R – rating matrix m*n
P – user matrix m*f
Q – song matrix f*n
f – number of features

Alternating Least Squares
http://mendeley.github.io/mrec/
https://github.com/benanne/wmf
https://github.com/benanne/theano_wmf

iteration
error

http://benanne.github.io/2014/08/05/spotify-cnns.html

Mel-spectrograms
A mel-spectrograms is a kind of time-frequency representation.
It is obtained from an audio signal by computing the Fourier
transforms of short, overlapping windows.
Finally, the frequency axis is changed from a linear scale to a mel
scale.
https://en.wikipedia.org/wiki/Mel_scale

Mel-spectrograms
series = np.sin(time)
# filename = "The Prodigy - Invaders Must Die.mp3"
# filename = "Lady GaGa - Poker Face.mp3"

Mel-spectrograms
Used log-compressed mel-spectrograms with 128 components
and the window size and hop size 1024 and 512 audio frames
respectively.
https://github.com/librosa/librosa
http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.h
tml#librosa.feature.melspectrogram

T-SNE
https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding

1024 1024 * 2
1024 * 4
Mel-spectrograms

Convolutional neural network
The deep neural network baseline architecture could be
consisted of two convolutional layers and two fully connected
layers.

259 x 128 x 1
4 x 128 x 1
259 x 4 x 32
4s
0.0029s

The network can be trained on windows of 3 seconds sampled
randomly from the audio clips.
The last layer of the network is the output layer, which predicts
40 latent factors obtained from the collaborative filtering.

http://www.slideshare.net/erikbern/music-recommendations-mlconf-2014
Album cover based models

1) series = (np.sin(time) - np.sin(time / np.pi))
https://www.google.com.ua/#q=y+%3D+sin%28x%29+-+sin%28x+%2F+pi%29
2) Deep content-based music recommendation
http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf
3) Collaborative Filtering for Implicit Feedback Datasets
http://yifanhu.net/PUB/cf.pdf
4) Alternating Least Squares Method for Collaborative Filtering
http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-col
laborative-filtering/
5) Recommending music on Spotify with deep learning
6) *
http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendati
on.pdf
http://cs229.stanford.edu/proj2013/FauciCastSchulze-MusicGenreClassification.pdf
http://ismir2011.ismir.net/papers/PS6-10.pdf
http://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms/
http://www.slideshare.net/irecsys/matrix-factorization-in-recommender-systems

Let’s stay in touch:
Facebook
https://www.facebook.com/neverdraw
LinkedIn
https://www.linkedin.com/in/awesomengineer
Github
https://github.com/spaceuniverse

Thanks
http://cdn.gymnasticstracks.com/wp-content/uploads/2015/09/httyd.jpg

Igor Kostiuk “Как приручить музыкальную рекомендательную систему”

More Related Content

Viewers also liked

Similar to Igor Kostiuk “Как приручить музыкальную рекомендательную систему”

More from Dakiry

Recently uploaded

Igor Kostiuk “Как приручить музыкальную рекомендательную систему”