AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную систему

Igor Kostiuk | 2016
Tags: #music, #recommender_systems, #deep_learning, #neural_networks, #mel_spectrograms
How to train your music recommender system

Recommender systems are a family of methods that seek to
predict the rating or preference that a user would give to an item
© Wiki
Is there something similar to something else?
There are two common ways to make recommendations.

Collaborative filtering
- cold start problem (requires a large amount of information on a
user in order to make accurate recommendations)
- will not recommend rare or new songs, games, etc. (popular
items will be much easier to recommend than unpopular items)
- bad scalability
+ content-agnostic
Example: Last.fm recommends music based on a comparison of
the listening habits of similar users.

http://ru.anime-characters-fight.wikia.com/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Anime-heroes-wallpaper-hd-cool-7.jpg
Popularity

Content-based filtering
- can only make recommendations that are similar to the original
seed
- semantic gap between audio or video, and the various aspects
of music / movie that affect user preferences (genre, mood)
- obvious recommendations ( Doom Doom 4 etc. )→
http://static.giantbomb.com/uploads/original/13/137381/2846580-doom.jpg

There is nothing more similar to the tea kettle than the other tea kettle

Approaches
1. Automatic generation of social tags
Social tags are user-generated keywords associated with song.
Predicting these social tags directly from MP3 files avoids the ''cold-
start problem''.
Using a set of one vs all classifiers for every tag, we can map audio
features onto social tags collected from the Web.
2. Music genre classification
Attempt to classify songs into a set of genre classes. Clustering – each
cluster represents a specific genre.
Setting label to each cluster by choosing the “majority vote” - which
genre was the most common in that cluster.
https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

Deep Learning approach
Predicting listening preferences from audio signals by training a
regression model to predict the latent representations of songs
that were obtained from a collaborative filtering model.
Data
from a collaborative filtering model
Data
raw mp3
Latent factors vector extracting
matrix factorization
Mel-spectrograms extracting
Deep neural network
input output
prediction

Advantages
+ Effectiveness in recommending new and unpopular songs
+ Good recommendations despite the semantic gap

Development stages
Data retrieval
The Echo Nest Taste Profile Subset
http://labrosa.ee.columbia.edu/millionsong/tasteprofile
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBSUJE12A6D4F8CF5 2
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBVFZR12A6D4F8AE3 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXALG12A8C13C108 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXHDL12A81C204C0 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBYHAJ12A6701BF1D 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1
b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5
Taste Profile subset is big. Some numbers:
1,019,318 unique users
384,546 unique MSD songs
48,373,586 user - song - play count triplets

Data retrieval
https://www.7digital.com/
We are able to attain 29 second audio clips for over 99% of the
dataset.
Original dataset has no raw audio, only precomputed, badly
documented features.

Weighted matrix factorization
https://youtu.be/o8PiWO8C3zs
song_id
user_id
song_id
user_id

n songs
m users ≈ *
musers
f
f
n songs
R P
Q
R – rating matrix m*n
P – user matrix m*f
Q – song matrix f*n
f – number of features

Alternating Least Squares
http://mendeley.github.io/mrec/
https://github.com/benanne/wmf
https://github.com/benanne/theano_wmf

iteration
error

http://benanne.github.io/2014/08/05/spotify-cnns.html

Mel-spectrograms
A mel-spectrograms is a kind of time-frequency representation.
It is obtained from an audio signal by computing the Fourier
transforms of short, overlapping windows.
Finally, the frequency axis is changed from a linear scale to a mel
scale.
https://en.wikipedia.org/wiki/Mel_scale

Mel-spectrograms
series = np.sin(time)
# filename = "The Prodigy - Invaders Must Die.mp3"
# filename = "Lady GaGa - Poker Face.mp3"

Mel-spectrograms
Used log-compressed mel-spectrograms with 128 components
and the window size and hop size 1024 and 512 audio frames
respectively.
https://github.com/librosa/librosa
http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.h
tml#librosa.feature.melspectrogram

1024 1024 * 2
1024 * 4
Mel-spectrograms

Convolutional neural network
The deep neural network baseline architecture could be
consisted of two convolutional layers and two fully connected
layers.

The network can be trained on windows of 3 seconds sampled
randomly from the audio clips.
The last layer of the network is the output layer, which predicts
40 latent factors obtained from the collaborative filtering.

http://www.slideshare.net/erikbern/music-recommendations-mlconf-2014
Album cover based models

1) series = (np.sin(time) - np.sin(time / np.pi))
https://www.google.com.ua/#q=y+%3D+sin%28x%29+-+sin%28x+%2F+pi%29
2) Deep content-based music recommendation
http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf
3) Collaborative Filtering for Implicit Feedback Datasets
http://yifanhu.net/PUB/cf.pdf
4) Alternating Least Squares Method for Collaborative Filtering
http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-col
laborative-filtering/
5) Recommending music on Spotify with deep learning
6) *
http://papers.nips.cc/paper/3370-automatic-generation-of-social-tags-for-music-recommendati
on.pdf
http://cs229.stanford.edu/proj2013/FauciCastSchulze-MusicGenreClassification.pdf
http://ismir2011.ismir.net/papers/PS6-10.pdf
http://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms/
http://www.slideshare.net/irecsys/matrix-factorization-in-recommender-systems

Let’s stay in touch:
Facebook
https://www.facebook.com/neverdraw
LinkedIn
https://www.linkedin.com/in/awesomengineer
Github
https://github.com/spaceuniverse

Thanks
http://cdn.gymnasticstracks.com/wp-content/uploads/2015/09/httyd.jpg

AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную систему

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную систему

Similar to AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную систему (20)

More from GeeksLab Odessa

More from GeeksLab Odessa (20)

Recently uploaded

Recently uploaded (20)

AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную систему