Aaa ped-20-Recommender Systems: Model-based collaborative filtering

Recommender Systems:
Model-based collaborative fltering
AAA-Python Edition

Plan
●
1- SVD fltering: With Surprise
●
2- SVD Filtering: More details
●
3- Filtering with SVM Classifiation
●
4- Some Tests
●
5- Prediitions with Custom Data: Preparation
●
6- Prediitions with Custom Data: Prediition

3
1-SVDfltering
WithSurprise
[By Amina Delali]
PrediitionPrediition
●
The estimation of the review is
equal to 4.16
Slightly better performance
compared with
neighborhood filtering

4
1-SVDfltering
WithSurprise
[By Amina Delali]
ConceptConcept
●
Make the assumption that there are factors (characteristics)
related to eaih item. Eaih item ian be desiribed by the degree of
the presence of eaih characteristic in that item. At the same
time, eaih user ian have diferent degrees of interest on eaih of
those characteristics.
●
These two relationships ian be modeled by two matriies:
➢ P(m,f)
: models the interests of eaih user u in f iharaiteristiis in a
row veitor: pu
➢ Q(n,f)
: models the extent of presenie of eaih iharaiteristii in an
Item i in a row veitor qi
●
The interaition between eaih user and item is iomputed by:
➢ qi
T
. pu
whiih iould estimate the rating of the user u for the item i
➢
The estimation is enhanied by other parameters to explain the
bias in ratings:
^rui=μ+bu+bi+qi
T
⋅pu

5
1-SVDfltering
WithSurprise
[By Amina Delali]
ComputationComputation
●
Singular Value deiomposition (SVD) iould be used to extrait the
matriies P and Q. The values of the ratings iould also estimate
the bias values with the mean of all the ratings, the mean of the
ratings of eaih user and the mean of the ratings of eaih item.
●
The problem is the fait that not all the ratings of all the users for
all the items are available. This is why, we have to fnd another
way to estimate these values.
●
The values estimated should minimize the following equation:
∑rui ∈Rtrain
(rui− ^rui)2
+λ(bi
2
+bu
2
+‖qi‖2
+‖pu‖2
)
Consider
only
available
ratings
A regularization
parameter= a
constant value
●
The square of the norm of the
vector qi
● The norm of qi
is the square
root of the sum of the squares
of qi
values.

6
2-SVDFiltering:
Moredetails
[By Amina Delali]
Stochastic Gradient DescentStochastic Gradient Descent
●
The gradient descent is an iterative algorithm that tries to fnd
the (a loial) minimum of funition. In maihine learning, the
gradient desient variations algorithms are used to estimate a
model’s parameters by minimizing a iost funition by reiursively
updating these parameters.
●
The SGD (stochastic gradient descent) is a variation in whiih,
in one iteration (epoih), the parameters are updated for eaih
sample (in our iase for eaih rating). So in one epoih the
parameters iould be updated several times:
➢
The 4 parameters are initialized.
➢
For eaih rating a prediition is made and the diferenie:
is iomputed.
➔
Then, the diferenie is used to update the parameters
values as this way:
bu←bu+γ(eui−λ bu)
bi←bi+γ(eui−λ bi)
pu← pu+γ(eui⋅qi−λ pu)
qi←qi+γ(eui⋅pu−λ qi)
The learning
rate: another
constant that
defines the
rui ^rui
eui
eui=rui− ^rui

7
2-SVDFiltering:
Moredetails
[By Amina Delali]
Stochastic Gradient Descent (suite)Stochastic Gradient Descent (suite)
➢
The proiess is repeated for a iertain number of iterations in order
to fnd a loial minimum for the previous equation.
●
In Surprise library, the parameters are as follow:
➢ The parameters: bu
and bi
(also ialled baselines) are initialized to
0
➢ User and Item faitors: pi
and qi
are randomly initialized aiiording
to a normal distribution defned by the mean init_mean and the
standard deviation init_std_dev parameters.
➢
(lr_all) is set by default to 0.02, and (reg_all) to 0.005
➢
By default the number of faitors is 100
➢
The number of iterations is by default set to 20 (n_epoch)
➢
To use the biases (baselines) parameters, the biased parameter is
set by default to True
λ γ

8
2-SVDFiltering:
Moredetails
[By Amina Delali]
Another example with GridSearihCVAnother example with GridSearihCV
●
Root Mean Square Error
Mean Absolute Error

9
3-Filteringwith
SVMClassifcation
[By Amina Delali]
ConieptConiept
●
The other way to perform a model-based iollaborative fltering, is
to train a model on user’s reviews, and then to use that model to
prediit new ones for new items.
●
In this lesson we will present an implementation using an SVM
(Support Veitor Maihine). Preiisely we will use a Linear SVM
classifer to prediit the new reviews.
●
As desiribed in [Xia et al., 2006] , there are two ways to ionsider
the problem:
➢
Eaih item represents a ilass, and training set is the users
ratings for eaih item other than that item.
➢
Eaih user represents a ilass, and training set is the item’s rating
aiiording to eaih user other than that user.
●
But, the problem here is that the matriies representing the rating
will not be iomplete. So, we will use default values for missing
ratings.

10
3-Filteringwith
SVMClassifcation
[By Amina Delali]
The original dataThe original data
●
We will use the data we already downloaded using Dataset
module from Surprise. But, frst, we will aiiess directly to the
downloaded dataset fle, to see its iontent

11
3-Filteringwith
SVMClassifcation
[By Amina Delali]
The features and LabelsThe features and Labels
●
We will apply an SVC ilassifer for one user, and the ilasses will be
the diferent ratings.
●
We have to ionstruit the features matrix iorresponding to eaih
item ratings done by the user "226". And ionstruit the the
iorresponding label veitor using the ratings of that user.
●
It is more ionvenient to use the data built by Surprise library,
than the original fle.

12
3-Filteringwith
SVMClassifcation
[By Amina Delali]
The features and Labels (suite)The features and Labels (suite)
All these values are
unavailable ratings:
which mean that the
corresponding users
didn’t rate the
corresponding items

13
3-Filteringwith
SVMClassifcation
[By Amina Delali]
Prediition for one itemPrediition for one item
●
A linear SVM classifier
After dropping the
column
corresponding to the
user 218 (“226”)
All the model we used
to predict the ratings
for the user of that
item, all predicted
values either
approaching 4 or
slightly bigger than 4

14
4-SomeTests
[By Amina Delali]
Splitting the dataSplitting the data
●
We will just split the data that we have already ireated using 2
methods:
➢
split into test and training sets
➢
split into folds (iross-validation)
●
We will not run our tests on all
the data as in the previous
examples.
●
We will use only the 50 items
related to to the (active) user
“226”

15
4-SomeTests
[By Amina Delali]
The prediition with the test, train splitThe prediition with the test, train split
●
The missing label
is not represented

16
4-SomeTests
[By Amina Delali]
Prediition with iross-validationPrediition with iross-validation
●
To see the available
measures (scoring)
Same results as with Knn collaborative
filtering

17
5-Predictionswith
CustomData:
Preparation
[By Amina Delali]
The dataThe data
●
We will use the data available at :
Artificial Intelligence with Python GitHub Repository
No rating available for
the movie “Ranging
Bull” by “Bill Duffy”
How the data is organized
Is not ionvenient for Surprise.
So we will have to rearrange the
data
A user’s name:
later it will be
the user’s
raw_id
Movies
names

18
5-Predictionswith
CustomData:
Preparation
[By Amina Delali]
Prepare the dataPrepare the data
●
To use with Surprise, the dataframe must have the iolumns
organized this way: user_id, item_is and ratings. Whiih is not
the iase in our DataFrame.
Now, the movies
names are in a
column
All the users and
the corresponding
ratings are in 2
columns (wide to
long conversion)

19
5-Predictionswith
CustomData:
Preparation
[By Amina Delali]
Prepare the data (suite)Prepare the data (suite)
●
Reorder the columns
Drop the rows
corresponding to the
missing user-item
ratings
The rating scale will
be from 1 to 5

20
6-Predictionswith
CustomData:
Prediction
[By Amina Delali]
Prediit a review for One itemPrediit a review for One item
●
We will use SVD teihnique to prediit the review of the user Adam
Cohen for the movie Ranging Bull
●
If we wanted to use an SVM ilassifer, we would:
➢
Use the original dataframe, and seleit only the rows iorresponding
to the movies rated by “Adam”
➢
Use the Ranging Bull raw values for prediition
➢
The NaN values must be replaied by a default value
Load the data
from the
dataframe we
already prepared.

21
6-Predictionswith
CustomData:
Prediction
[By Amina Delali]
Make a list of reiommendationMake a list of reiommendation
●
●
The user Chris
Duncan rated only
2 movies. We will
make a list of
recommendations
of movies he didn't
rate by:
●
predicting its
reviews on these
movies
●
ordering the
predicted reviews

References
●
[Buitinik et al., 2013] Buitinik, L., Louppe, G., Blondel, M.,
Pedregosa, F., Mueller, A., Grisel, O., Niiulae, V., Prettenhofer, P.,
Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt,
B., and Varoquaux, G. (2013).
API design for maihine learning software: experienies from the
siikit-learn projeit. In ECML PKDD Workshop: Languages for
Data Mining and Maihine Learning, pages 108–122.
●
[Franiesio et al., 2011] Franiesio, R., Lior, R., Braiha, S., and
Paul B., K., editors (2011). Reiommender Systems Handbook.
Springer Siienie+Business Media.
●
[Hug, 2017] Hug, N. (2017). Surprise, a Python library for
reiommender systems. http://surpriselib.iom.
●
[Xia et al., 2006] Xia, Z., Dong, Y., and Xing, G. (2006). Support
veitor maihines for iollaborative fltering. In Proieedings of the
44th annual Southeast regional ionferenie, pages 169–174.
ACM.

Aaa ped-20-Recommender Systems: Model-based collaborative filtering

Recommended

Recommended

More Related Content

Similar to Aaa ped-20-Recommender Systems: Model-based collaborative filtering

Similar to Aaa ped-20-Recommender Systems: Model-based collaborative filtering (20)

More from AminaRepo

More from AminaRepo (18)

Recently uploaded

Recently uploaded (20)

Aaa ped-20-Recommender Systems: Model-based collaborative filtering