SlideShare a Scribd company logo
1 of 68
Download to read offline
Etude de la Diversité, la Nouveauté, et la Pertinence
dans les Systèmes de Recommandation
Chems Eddine BERBAGUE
Directeur de thèse : Pr. Hassina SERIDI
Co-directeur de thèse : Dr. Karabadji Nour El-Islem
Novembre, 2021
Table of Contents
1. Introduction
2. State of the Art
3. Contributions: Users clusterings and pairwise similairty.
4. Final Conclusion
1/67
Introduction
2/67
Recommender systems
Figure 1: General sheme of RS algortihms, and evaluation.
3/67
Research questions
What makes the collaborative filtering a good research choice ?
Why is the clustering one of the best techniques for dealing with the
recommendation issues?
How well are the bio-inspired clusteirng techniques?
4/67
Aims and objectives
Improve the scalability of the memory-based collaborative filtering
algorithm.
Improve the recommendation quality.
5/67
Thesis research axes
Memory-based collaborative filtering algorithms
Neighbor 2
Neighbor 1
Neighbor 3
Target User
Similarity 3
Similarity 1
Similarity 2
Figure 2: General scheme of user-based collaborative filtering.
6/67
Thesis research axes
Dimentionality reduction
Neighbor 2
Neighbor 1
Neighbor 3
Target User
Similarity 3
Similarity 1
Similarity 2
Candidate neighbor
Candidate neighbor
Candidate neighbor
Candidate neighbor
Candidate neighbor
Candidate neighbor
Figure 3: Dimentionality reduction using clustering.
7/67
Thesis research axes
Recommendation quality improvment
Yes, we care about the quality !
Diversity
Relevancy
Noverlty
Figure 4: Different recommendation quality metrics.
8/67
State of the Art
9/67
Memory user-based collaborative filtering (CF).
Figure 5: Flow of memory user-based collaborative filtering.
10/67
Comparison of collaborative filtering to other
appraoches.
Algorithm Diversity Relevancy Sparsity Cold Start Scalability Simplicity
ICF ? ? ? ? ? ? ? ? ? ??
UCF ?? ?? ? ? ? ? ? ??
SOC/DEM ? ? ? ? ? ? ? ? ? ? ? ??
CBF ? ?? ? ? ? ? ? ?
HYB ?? ? ? ?? ?? ?? ? ? ? ?
MF ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Table 1: Comparison of different recommendation techniques
11/67
Limits of memory-based collaborative filtering
Complexity
Figure 6: Different complexity examples.
12/67
Limits of memory-based collaborative filtering
Sensitivity to data quality
Sparsity, cold start ...etc.
i1 i2 i3 i4 i5 i6 i7 i8
u1 ? ? ? ? ? ? 5 1
u2 1 ↑ ↓ 4↑ ↓ ? ? ? ? 4 2
u3 ? ? 4 2 ? ? 5 4
u4 ? ? 3 4 4↑ ↓ 5↑ ↓ 5 4
u5 ? ? ? ? ? ? ? ?
Table 2: Example of User-Item rating matrix.
13/67
How were evolutionary algorithms adapted to
RS ?
Inter-algorithmic use.
similarity calculation, recommendation ranking, clustering, latent factor models ..ect.
Intra-algorithmic use.
hybridization ..ect.
14/67
Bio-inspired algorithms
Algorithm GA ACO ANN ABC BAT FSS PSO
Similarity X X X - - - -
Weighting X - X - X X -
Clustering X - - X X X -
Re-ranking X - - - - - X
Latent factor models - - - - - - X
Graph-based models - X - X - - X
Table 3: Summary of different use of bio-inspired algorithms in RS.
15/67
Genetic-based multi-objective optimization
Genetic algorithms (GA).
GA is a bio-inspired machine learning tool for optimization. It consists of exploring a large search space
to select among the possible solutions, the most suitable/fit one.
Figure 7: General scheme of genetic-based multi-objectives optimization algorithm.
16/67
Questios about using GAs in RSs
Which problem to target ?
How to formally describe the problem ?
How to define the quality of the solution ?
17/67
Contributions: users
clustering and pairwise
similairty
18/67
Outline
Datasets & experimental configurations
1. GA-CLUS: An Evolutionary Scheme to Improve Scalability
2. TS-IKNN: A Two-Stage Improved KNN
3. GA-DCLUS: A Multi-Objectives Clustering-Based Recommendation
Approach
19/67
Datasets & experimental configurations
Dataset #Users #Items #Ratings Sparsity
Movielens 100k 943 1682 100000 6%
Movielens 1M 6040 3952 1000000 4%
Table 4: Dataset statics of 100K movielens and 1M movielens datasets.
20/67
I .GA-CLUS:Users
Clustering Using a
Genetic Algorithm.
21/67
Neighborhood selection
Target
Candidate
1
Candidate
4
Candidate
3
Candidate
2
Figure 8: Illustration of neighborhood
selection
Target
Candidat
e1
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4
Candidat
e4 Candidat
e3
Candidat
e2
Figure 9: Illustration of neighborhood
selection
22/67
The clustering is a solution !
Figure 10: Simple clustering example.
We propose the use of a genetic algorithm to:
• Catch the best clusters number.
• Explore the search space and select among its members the best clusters representatives.
• Controlling the borders overlap problem by specifying the clusters sizes.
23/67
Encoding scheme of GA-CLUS
k C1 !1 Cx !x Ck !k
… …
k is an	integer to	represent number of	clusters
Ci is an	integer to	represent a	profil	as	centre	of	the	cluster	i	
!i is an	integer to	represent number of	profiles	in	the	cluster	i
Figure 11: Encoding scheme of GA-CLUS
010 00000000000010 0100 00000000000111 1110
k 𝛼1
C1
𝛼2
C2
100 01010 0100 00101 0100 01111 0100 00001 0100
k 𝛼1
C1 C2 C3 C4
𝛼2 𝛼3 𝛼4
011 00000001 0100 00000101 1000 00001111 0110
k 𝛼1
C1 C2 C3
𝛼2 𝛼3
100 00001 0100 00100 0100 00101 0100 01000 0100
k 𝛼1
C1 C2 C3 C4
𝛼2 𝛼3 𝛼4
(a) (b)
(c) (d)
Figure 12: Some samples of GA-CLUS encoding scheme
24/67
Fitness function
Minimizing the MAE of each cluster:
group_precision(ch) = (max(r) − min(r)) − (
1
k
× (
k
X
i=1
MAE(Gi))) (1)
Diversifying the clusters’ ceters:
center_diversity(ch) =
1
k × (k − 1)
× (
k
X
i=1
k−1
X
j=i+1
(1 − sim(Ci, Cj+1))) (2)
Combination of centers’ diversity and clusters’ precision:
fitness(ch) = group_precision(ch) + center_diversity(ch) (3)
25/67
Configuration
• Baseline algorithms:
◦ Memory user-based collaborative filtering (Knn).
◦ K-means clustering algorithm.
◦ PCA-GAKM clustering algorithm.
• Experimental scenarios:
◦ Analyse the neighborhood size effect.
◦ Analyse the recommendation length effect.
• Evaluation metrics:
◦ Rating prediction accuracy: mean absolute error (MAE).
◦ Recommendation set accuracy: recall, precision.
26/67
Rating prediction comparison of GA-CLUS in terms
of the neighborhood size
Figure 13: Comparison of GA-CLUS to
KNN.
Figure 14: Comparison of GA-CLUS to
Kmeans and PCA-GAKM.
27/67
Relevancy comparison of GA-CLUS in terms of
the neighborhood size
Figure 15: Precision comparison of
GA-CLUS to other methods.
Figure 16: Recall comparison of
GA-CLUS to other methods.
28/67
Relevancy comparison of GA-CLUS in terms of
Top-N recommendations
Figure 17: Precision comparison of
GA-CLUS with different recommendation
length.
Figure 18: Recall comparison of
GA-CLUS with different recommendation
length.
29/67
Partial conclusions
+ We encoded the clustering problem as to reduce the search
space.
+ We optimized the quality of the clustering in a way to achieve
more accurate results.
- Accuracy is an insufficient measure to evaluate the satisfaction of
users.
30/67
Discussion
Users’ similarity VS recommendation diversity.
Users
Low diversity
System
High popularity
Figure 19: Confilicting diversity of recommendation.
31/67
II.TS-IKNN: Two-Stage
Improved k-NN
Algorithm
32/67
TS-IKNN:
First stage:
We sought to reduce the search space extension by performing an
adapted KNN algorithm. In this stage, we modified a similarity mea-
sure to combine a pairwise user diversity measure and a similarity-
based rating measure.
Second stage:
We employed a based genetic algorithm to improve the neighborhood
selection.
33/67
First stage:
Considering the diversity within a similarity measure may allow obtain-
ing a dual control on the users set to select similar and diverse ones at
the same time:
• The similarity definition:
new_sim(u1, u2) = α × sim(u1, u2) + (1 − α) × div(u1, u2), (4)
• The diversity definition:
div(u1, u2) =
X
i∈I2−I1
1 −
P(i)
|U|
, (5)
34/67
Second stage:
Figure 20: Chromosome encoding of the neighborhood optimization.
Combination of the diversity and the relevancy:
fitness(ch) = β × (1 − precision) + (1 − β) × (1 − diversity), (6)
35/67
The similarity calculation algorithm of the sec-
ond stage
36/67
Results of TS-IKNN
0 1 2 3 4 5 6 7
0
100
200
300
400
500
600
Algorithms
Coverage
KNN normal similarity
KNN modified similarity
Kmeans best Precision
Kmeans best Coverage
Proposed algorithm
LDA
Figure 21: Coverage comparision of TS-IKNN to other methods.
Normal similarity Adjusted similarity K-means best precision K-means best coverage Proposed approach LDA
Precision 0.6106 0.6130 0.6379 0.6321 0.6524 0.3368
Table 5: Precision comparision of TS-IKNN to other methods.
37/67
Partial conclusions
We presented an evolutionary algorithm that acts in two-stages with
the aim of making a balance between coverage and precision. How-
ever, this method suffers some limitations which we tried to improve
and solve:
• A neighborhood is assigned to a user according to binary weights,
which may exclude explorer users.
• The first neighbors’ candidates set size is hard to fix, whereas big
numbers allow a better exploration of possibilities. However, it
increases complexity.
• The correlation between the novelty of recommendations and
their coverage is not cleared up.
38/67
Perspectives and future work
A:
• Statics could misleading.
• The clustering alleviate the data dimensionality curse.
Q:
• How to improve the similarity calculation ?
• How to use clustering for targeting the novelty/diversity metrics ?
• Can we benefit from more information to improve the clustering ?
39/67
III.GA-DCLUS:
Diversified Users’
Clustering Using GA
40/67
!
Analyzing the popularity of the items
0 200 400 600 800 1000 1200 1400 1600 1800
Items
0
50
100
150
200
250
300
350
400
450
500
Number
of
ratings
Popularity of items
Figure 22: Popularity of items in movielens 100K.
41/67
Analyzing the users’ tendencies toward popularity.
Item rarity:
rarity(i) =
|Ui| − freqmin
freqmax − freqmin
, (7)
User tendancy:
tendency(u) =
1
|Iu|
X
i∈Iu
1 − rarity(i), (8)
42/67
Analyzing the users’ tendencies toward popularity
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Rarity of items
0
25
50
75
100
125
150
175
200
225
250
270
Number
of
users
Figure 23: Users’ tendencies toward popularity.
43/67
Principals of GA-DCLUS
We denote by U the set of users, and C the set of clusters. Our clus-
tering scheme consists of creating a matrix W = {w1, w2, ..w|U|}
of |U| rows and |C| columns. We define for each user u ∈ U, a be-
longing weigh vector wu ∈ W of |C| values, which we denote by
wu = {w(u,1), w(u,2), ..w(u,|C|)}, as to assign for u and a given cluster j ∈ C
a belonging weight value w(u,j). Furthermore, we assign each user u to
one main cluster, and zero or more seconding clusters in respect to
the belonging weight vector wu. Next elements explain how we pro-
ceeded to assign users.
44/67
Principals of GA-DCLUS
Main clusters set, which we denote by CM
, and it consists of |U| value,
as to have CM
= {cM
1 , cM
2 , ..cM
|U|}, whereas each user u has a main clus-
ter cM
u in which we apply UCF algorithm to generate his recommenda-
tions. The cluster cM
u is selected by identifying from wu, the cluster of
the highest belonging weight.
45/67
Principals of GA-DCLUS
Secondary clusters set, which we denote by CS
, and it consists of |U|
sub-vector, as to have CS
= {cS
1 , cS
2, ..cS
|U|}. Each user u has a set cS
u of
nu secondary clusters denoted by cS
u = {c1, c2, ..cnu }. User u can partic-
ipate in the clusters of cS
u as a candidate neighbor only, and does not
get any recommendations. In order to improve the scalability, a mini-
mum belonging threshold θ is added to the chromosome’ encoding.
46/67
Clustering encoding scheme of GA-DCLUS
... ...
CLUS NEI TH W1,1 W1,2 ... W1,|C|
Clustering
parameters
U1
W2,1 W2,2 ... W2,|C| W|U|,1 W|U|,2 ... W|U|,|C|
U2 U|U|
Figure 24: The chromosome encoding scheme.
47/67
Example of the clustering encoding scheme
U1 U2 U3 U4
CLUS NEI TH W1,1 W1,2 W1,3 W2,1 W2,2 W2,3 W3,1 W3,2 W3,3 W4,1 W4,2 W4,3
2 48 0.8 0.95 0.85 0.7 0.55 0.92 0.88 0.90 0.81 0.91 0.7 0.94 0.74
Figure 25: Example of the proposed genetic encoding.
48/67
Fitness function
Pairwise users diversity:
div(u1, u2) =
1
|I2 − I1|
X
i∈I2−I1
rarity(i), (9)
Cluster’ diversity:
cluster_content(c) =
X
(u1,u2)∈c
α · sim(u1, u2) + (1 − α) · div(u1, u2), (10)
Fitness function:
fitness_function(ch) = β · (1 − coverage) + (1 − β)
·(γ · (1 − precision) + (1 − γ) ·
1
|C|
X
c∈C
cluster_content(c)
(11)
49/67
Improved similarity measure
Prediction formula:
Pr(u, i) = ¯
ru +
P
v∈N(u)sim(u, v) · R(v, i)
P
v∈N(u) sim(u, v)
, (12)
Improved similarity measure:
new_sim(u1, u2) =
w1,cM
1
+ w2,cM
1
2
· sim(u1, u2) (13)
50/67
Difference convergence speed using different
similarity measrures
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Generations
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
Fitness
function
Weighted similarity
Normal similarity
Figure 26: Convergence speed of GA-DCLUS with normal and weighted similarity
measure. 51/67
Performance difference using different simiar-
ity measures
Configuration Precision Recall F1 Coverage Novelty
GAsim 0.5560 0.6831 0.6130 0.4713 0.8165
GAwsim 0.5625 0.6830 0.6269 0.4733 0.8165
Table 6: Comparison of GA-DCLUS results using different similarity metrics.
52/67
Improved prediction formula
. . . . . .
CLUS NEI TH W1,1 W1,2 . . . W1,|C|
Clustering
parameters
U1
W2,1 W2,2 . . . W2,|C| W|U|,1 W|U|,2 . . . W|U|,|C|
U2 U|U|
WN
1 WN
2 . . . WN
|U|
Novelty weights
Figure 27: The chromosome encoding scheme using novelty weights.
The item’ novelty:
Nov(i) = −Log(
|Ui|
|U|
) · Log(|U|), (14)
The improved prediction formula:
Primp(u, i) = reg · wN
u · Nov(i) + Pr(u, i), (15)
53/67
Steps of GA-DCLUS clustering algorithm
54/67
Experimental configurations
Mutation Pr Crossover Pr Population Selection St
0.02 0.9 50 binary tournament
Table 7: GA configuration.
reg maxN minN maxC minC α β γ
[0-5] 50 10 50 4 0.2 0.9 0.1
Table 8: GA-DCLUS parameters.
55/67
Relevancy performance of GA-DCLUS
Precision Recall F1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Performance metrics
Performance
values
GA(wsim,reg=0) GA(wsim,reg=1) GA(wsim,reg=2) GA(wsim,reg=3) GA(wsim,reg=4) GA(wsim,reg=5)
Figure 28: GA-DCLUS performance with differenet configurations versus the
relevancy metrics on 100k movielens dataset. 56/67
Diversity performance of GA-DCLUS
Precision Recall F1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Performance metrics
Performance
values
GA(wsim,reg=0) GA(wsim,reg=5) KNN Kmeans PCA−GAKM
Figure 29: GA-DCLUS comparison to other methods on base of relevancy metrics on
100k movielens dataset. 57/67
Diversity comparison of GA-DCLUS to other methods
Coverage Novelty
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Performance
values
Performance metrics
GA(wsim,reg=0) GA(wsim,reg=5) KNN Kmeans PCA−GAKM
Figure 30: GA-DCLUS comparision to other methods on base of diversity metrics on
100k movielens dataset. 58/67
Results on the 1M MovieLens
Algorithm Precision Recall F1 Coverage Novelty
GAwsim,reg=0 0.6355 0.5889 0.6113 0.5349 0.8434
GAwsim,reg=5 0.6093 0.5784 0.5934 0.6307 0.8719
KNN 0.4582 0.3832 0.4173 0.2948 0.7808
Kmeans 0.5756 0.4683 0.5166 0.4608 0.7983
PCA − GAKM 0.6041 0.5184 0.5580 0.4794 0.8200
Table 9: Comparison of GA-DCLUS to other methods on 1M movielens dataset.
59/67
Comparison to re-ranking methods
MMR is a re-ranking algorithm that promotes diversity using a controlling paramter. We used the next
objective function:
i∗
= argmaxi∈(Ru−Su)(1 − λ) × rel(i, Su) + λ × gnov(i, Su), (16)
Relevancy:
rel(i, Su) =
Pr(u, i) +
P
j∈Su
Pr(u, j)
1 + |Su|
, (17)
Novelty:
gnov(i, Su) =
Nov(i) +
P
j∈Su
Nov(j)
1 + |Su|
, (18)
60/67
Comparison to re-ranking methods
Algorithm Precision Recall F1 Coverage Novelty
GAwsim,reg=3 0.5490 0.6760 0.6059 0.5214 0.8310
Kmeans(MMR) 0.5248 0.6184 0.5696 0.4423 0.8040
PCA − GAKM(MMR) 0.5461 0.6648 0.5996 0.4840 0.8201
Table 10: Comparision of GA-DCLUS to MMR on 100k movielens dataset.
61/67
Comparison to re-ranking methods
Algorithm Precision Recall F1 Coverage Novelty
GAwsim,reg=5 0.6093 0.5784 0.5934 0.6307 0.8719
Kmeans(MMR) 0.5585 0.4354 0.4893 0.4752 0.8038
PCA − GAKM(MMR) 0.5988 0.5372 0.5663 0.4727 0.8349
Table 11: Comparison of GA-DCLUS to MMR on 1M movielens dataset.
62/67
Final conclusions
63/67
General conclusions
Genetic algorithms are good optimization tool which allow us to integrate more data
sources, to hybrid more than a recommender, and to improve the similarity mea-
sures.
Recommendation quality can be driven by real indicators during the optimization,
within a fitness function such as MAE or Precision.
GA allows to handle more than one problem at once by adjusting the encoding of the
solutions and the fitness function. The latter, can be parametrized using controilling
parameters.
64/67
Issues
GAs take time !
many critical parameters, hard decoding, hard evaluation, GA’ behaviour is unpre-
dictable...etc.
The fitness function is hard to fix !
which features, which metrics, which formulas...etc.
65/67
Future works
Divide the clustering tasks in sub tasks !
map-reduce paradigm, parallelization, better fitness function, algorithms combina-
tion..etc.
Combine more data sources !
feature weighting, feature extraction,...etc.
66/67
Thank you for your
attention.
67/67

More Related Content

What's hot

Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...Hok Lie
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgnirudra Sikdar
 
A Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsA Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
 
A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...
A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...
A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...ijcsa
 
Comparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face RecognitionComparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
 
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS ijcsa
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...paperpublications3
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Shubhashis Shil
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithmDr Sandeep Kumar Poonia
 
Multi objective optimization using
Multi objective optimization usingMulti objective optimization using
Multi objective optimization usingijaia
 

What's hot (20)

Dmml report final
Dmml report finalDmml report final
Dmml report final
 
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
Resolving Multi Objective Stock Portfolio Optimization Problem Using Genetic ...
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity management
 
A Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsA Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test Functions
 
T180203125133
T180203125133T180203125133
T180203125133
 
A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...
A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...
A Non-Revisiting Genetic Algorithm for Optimizing Numeric Multi-Dimensional F...
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
Comparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face RecognitionComparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face Recognition
 
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
 
50120130406007
5012013040600750120130406007
50120130406007
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithm
 
DataMining_CA2-4
DataMining_CA2-4DataMining_CA2-4
DataMining_CA2-4
 
Multi objective optimization using
Multi objective optimization usingMulti objective optimization using
Multi objective optimization using
 
I0704047054
I0704047054I0704047054
I0704047054
 

Similar to Study of relevancy, diversity, and novelty in recommender systems

Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationEvgeny Frolov
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
Dissertation Data Fusion Summary Poster
Dissertation Data Fusion Summary PosterDissertation Data Fusion Summary Poster
Dissertation Data Fusion Summary PosterChris Ballard
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction modelsMuthu Kumaar Thangavelu
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction modelsMuthu Kumaar Thangavelu
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2IAEME Publication
 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptxMurindanyiSudi1
 
Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13ICDEcCnferenece
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...André Gonçalves
 

Similar to Study of relevancy, diversity, and novelty in recommender systems (20)

2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
Second subjective assignment
Second  subjective assignmentSecond  subjective assignment
Second subjective assignment
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Dissertation Data Fusion Summary Poster
Dissertation Data Fusion Summary PosterDissertation Data Fusion Summary Poster
Dissertation Data Fusion Summary Poster
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Final Project
Final ProjectFinal Project
Final Project
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptx
 
Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Study of relevancy, diversity, and novelty in recommender systems

  • 1. Etude de la Diversité, la Nouveauté, et la Pertinence dans les Systèmes de Recommandation Chems Eddine BERBAGUE Directeur de thèse : Pr. Hassina SERIDI Co-directeur de thèse : Dr. Karabadji Nour El-Islem Novembre, 2021
  • 2. Table of Contents 1. Introduction 2. State of the Art 3. Contributions: Users clusterings and pairwise similairty. 4. Final Conclusion 1/67
  • 4. Recommender systems Figure 1: General sheme of RS algortihms, and evaluation. 3/67
  • 5. Research questions What makes the collaborative filtering a good research choice ? Why is the clustering one of the best techniques for dealing with the recommendation issues? How well are the bio-inspired clusteirng techniques? 4/67
  • 6. Aims and objectives Improve the scalability of the memory-based collaborative filtering algorithm. Improve the recommendation quality. 5/67
  • 7. Thesis research axes Memory-based collaborative filtering algorithms Neighbor 2 Neighbor 1 Neighbor 3 Target User Similarity 3 Similarity 1 Similarity 2 Figure 2: General scheme of user-based collaborative filtering. 6/67
  • 8. Thesis research axes Dimentionality reduction Neighbor 2 Neighbor 1 Neighbor 3 Target User Similarity 3 Similarity 1 Similarity 2 Candidate neighbor Candidate neighbor Candidate neighbor Candidate neighbor Candidate neighbor Candidate neighbor Figure 3: Dimentionality reduction using clustering. 7/67
  • 9. Thesis research axes Recommendation quality improvment Yes, we care about the quality ! Diversity Relevancy Noverlty Figure 4: Different recommendation quality metrics. 8/67
  • 10. State of the Art 9/67
  • 11. Memory user-based collaborative filtering (CF). Figure 5: Flow of memory user-based collaborative filtering. 10/67
  • 12. Comparison of collaborative filtering to other appraoches. Algorithm Diversity Relevancy Sparsity Cold Start Scalability Simplicity ICF ? ? ? ? ? ? ? ? ? ?? UCF ?? ?? ? ? ? ? ? ?? SOC/DEM ? ? ? ? ? ? ? ? ? ? ? ?? CBF ? ?? ? ? ? ? ? ? HYB ?? ? ? ?? ?? ?? ? ? ? ? MF ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Table 1: Comparison of different recommendation techniques 11/67
  • 13. Limits of memory-based collaborative filtering Complexity Figure 6: Different complexity examples. 12/67
  • 14. Limits of memory-based collaborative filtering Sensitivity to data quality Sparsity, cold start ...etc. i1 i2 i3 i4 i5 i6 i7 i8 u1 ? ? ? ? ? ? 5 1 u2 1 ↑ ↓ 4↑ ↓ ? ? ? ? 4 2 u3 ? ? 4 2 ? ? 5 4 u4 ? ? 3 4 4↑ ↓ 5↑ ↓ 5 4 u5 ? ? ? ? ? ? ? ? Table 2: Example of User-Item rating matrix. 13/67
  • 15. How were evolutionary algorithms adapted to RS ? Inter-algorithmic use. similarity calculation, recommendation ranking, clustering, latent factor models ..ect. Intra-algorithmic use. hybridization ..ect. 14/67
  • 16. Bio-inspired algorithms Algorithm GA ACO ANN ABC BAT FSS PSO Similarity X X X - - - - Weighting X - X - X X - Clustering X - - X X X - Re-ranking X - - - - - X Latent factor models - - - - - - X Graph-based models - X - X - - X Table 3: Summary of different use of bio-inspired algorithms in RS. 15/67
  • 17. Genetic-based multi-objective optimization Genetic algorithms (GA). GA is a bio-inspired machine learning tool for optimization. It consists of exploring a large search space to select among the possible solutions, the most suitable/fit one. Figure 7: General scheme of genetic-based multi-objectives optimization algorithm. 16/67
  • 18. Questios about using GAs in RSs Which problem to target ? How to formally describe the problem ? How to define the quality of the solution ? 17/67
  • 19. Contributions: users clustering and pairwise similairty 18/67
  • 20. Outline Datasets & experimental configurations 1. GA-CLUS: An Evolutionary Scheme to Improve Scalability 2. TS-IKNN: A Two-Stage Improved KNN 3. GA-DCLUS: A Multi-Objectives Clustering-Based Recommendation Approach 19/67
  • 21. Datasets & experimental configurations Dataset #Users #Items #Ratings Sparsity Movielens 100k 943 1682 100000 6% Movielens 1M 6040 3952 1000000 4% Table 4: Dataset statics of 100K movielens and 1M movielens datasets. 20/67
  • 22. I .GA-CLUS:Users Clustering Using a Genetic Algorithm. 21/67
  • 23. Neighborhood selection Target Candidate 1 Candidate 4 Candidate 3 Candidate 2 Figure 8: Illustration of neighborhood selection Target Candidat e1 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e4 Candidat e3 Candidat e2 Figure 9: Illustration of neighborhood selection 22/67
  • 24. The clustering is a solution ! Figure 10: Simple clustering example. We propose the use of a genetic algorithm to: • Catch the best clusters number. • Explore the search space and select among its members the best clusters representatives. • Controlling the borders overlap problem by specifying the clusters sizes. 23/67
  • 25. Encoding scheme of GA-CLUS k C1 !1 Cx !x Ck !k … … k is an integer to represent number of clusters Ci is an integer to represent a profil as centre of the cluster i !i is an integer to represent number of profiles in the cluster i Figure 11: Encoding scheme of GA-CLUS 010 00000000000010 0100 00000000000111 1110 k 𝛼1 C1 𝛼2 C2 100 01010 0100 00101 0100 01111 0100 00001 0100 k 𝛼1 C1 C2 C3 C4 𝛼2 𝛼3 𝛼4 011 00000001 0100 00000101 1000 00001111 0110 k 𝛼1 C1 C2 C3 𝛼2 𝛼3 100 00001 0100 00100 0100 00101 0100 01000 0100 k 𝛼1 C1 C2 C3 C4 𝛼2 𝛼3 𝛼4 (a) (b) (c) (d) Figure 12: Some samples of GA-CLUS encoding scheme 24/67
  • 26. Fitness function Minimizing the MAE of each cluster: group_precision(ch) = (max(r) − min(r)) − ( 1 k × ( k X i=1 MAE(Gi))) (1) Diversifying the clusters’ ceters: center_diversity(ch) = 1 k × (k − 1) × ( k X i=1 k−1 X j=i+1 (1 − sim(Ci, Cj+1))) (2) Combination of centers’ diversity and clusters’ precision: fitness(ch) = group_precision(ch) + center_diversity(ch) (3) 25/67
  • 27. Configuration • Baseline algorithms: ◦ Memory user-based collaborative filtering (Knn). ◦ K-means clustering algorithm. ◦ PCA-GAKM clustering algorithm. • Experimental scenarios: ◦ Analyse the neighborhood size effect. ◦ Analyse the recommendation length effect. • Evaluation metrics: ◦ Rating prediction accuracy: mean absolute error (MAE). ◦ Recommendation set accuracy: recall, precision. 26/67
  • 28. Rating prediction comparison of GA-CLUS in terms of the neighborhood size Figure 13: Comparison of GA-CLUS to KNN. Figure 14: Comparison of GA-CLUS to Kmeans and PCA-GAKM. 27/67
  • 29. Relevancy comparison of GA-CLUS in terms of the neighborhood size Figure 15: Precision comparison of GA-CLUS to other methods. Figure 16: Recall comparison of GA-CLUS to other methods. 28/67
  • 30. Relevancy comparison of GA-CLUS in terms of Top-N recommendations Figure 17: Precision comparison of GA-CLUS with different recommendation length. Figure 18: Recall comparison of GA-CLUS with different recommendation length. 29/67
  • 31. Partial conclusions + We encoded the clustering problem as to reduce the search space. + We optimized the quality of the clustering in a way to achieve more accurate results. - Accuracy is an insufficient measure to evaluate the satisfaction of users. 30/67
  • 32. Discussion Users’ similarity VS recommendation diversity. Users Low diversity System High popularity Figure 19: Confilicting diversity of recommendation. 31/67
  • 34. TS-IKNN: First stage: We sought to reduce the search space extension by performing an adapted KNN algorithm. In this stage, we modified a similarity mea- sure to combine a pairwise user diversity measure and a similarity- based rating measure. Second stage: We employed a based genetic algorithm to improve the neighborhood selection. 33/67
  • 35. First stage: Considering the diversity within a similarity measure may allow obtain- ing a dual control on the users set to select similar and diverse ones at the same time: • The similarity definition: new_sim(u1, u2) = α × sim(u1, u2) + (1 − α) × div(u1, u2), (4) • The diversity definition: div(u1, u2) = X i∈I2−I1 1 − P(i) |U| , (5) 34/67
  • 36. Second stage: Figure 20: Chromosome encoding of the neighborhood optimization. Combination of the diversity and the relevancy: fitness(ch) = β × (1 − precision) + (1 − β) × (1 − diversity), (6) 35/67
  • 37. The similarity calculation algorithm of the sec- ond stage 36/67
  • 38. Results of TS-IKNN 0 1 2 3 4 5 6 7 0 100 200 300 400 500 600 Algorithms Coverage KNN normal similarity KNN modified similarity Kmeans best Precision Kmeans best Coverage Proposed algorithm LDA Figure 21: Coverage comparision of TS-IKNN to other methods. Normal similarity Adjusted similarity K-means best precision K-means best coverage Proposed approach LDA Precision 0.6106 0.6130 0.6379 0.6321 0.6524 0.3368 Table 5: Precision comparision of TS-IKNN to other methods. 37/67
  • 39. Partial conclusions We presented an evolutionary algorithm that acts in two-stages with the aim of making a balance between coverage and precision. How- ever, this method suffers some limitations which we tried to improve and solve: • A neighborhood is assigned to a user according to binary weights, which may exclude explorer users. • The first neighbors’ candidates set size is hard to fix, whereas big numbers allow a better exploration of possibilities. However, it increases complexity. • The correlation between the novelty of recommendations and their coverage is not cleared up. 38/67
  • 40. Perspectives and future work A: • Statics could misleading. • The clustering alleviate the data dimensionality curse. Q: • How to improve the similarity calculation ? • How to use clustering for targeting the novelty/diversity metrics ? • Can we benefit from more information to improve the clustering ? 39/67
  • 42. ! Analyzing the popularity of the items 0 200 400 600 800 1000 1200 1400 1600 1800 Items 0 50 100 150 200 250 300 350 400 450 500 Number of ratings Popularity of items Figure 22: Popularity of items in movielens 100K. 41/67
  • 43. Analyzing the users’ tendencies toward popularity. Item rarity: rarity(i) = |Ui| − freqmin freqmax − freqmin , (7) User tendancy: tendency(u) = 1 |Iu| X i∈Iu 1 − rarity(i), (8) 42/67
  • 44. Analyzing the users’ tendencies toward popularity 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Rarity of items 0 25 50 75 100 125 150 175 200 225 250 270 Number of users Figure 23: Users’ tendencies toward popularity. 43/67
  • 45. Principals of GA-DCLUS We denote by U the set of users, and C the set of clusters. Our clus- tering scheme consists of creating a matrix W = {w1, w2, ..w|U|} of |U| rows and |C| columns. We define for each user u ∈ U, a be- longing weigh vector wu ∈ W of |C| values, which we denote by wu = {w(u,1), w(u,2), ..w(u,|C|)}, as to assign for u and a given cluster j ∈ C a belonging weight value w(u,j). Furthermore, we assign each user u to one main cluster, and zero or more seconding clusters in respect to the belonging weight vector wu. Next elements explain how we pro- ceeded to assign users. 44/67
  • 46. Principals of GA-DCLUS Main clusters set, which we denote by CM , and it consists of |U| value, as to have CM = {cM 1 , cM 2 , ..cM |U|}, whereas each user u has a main clus- ter cM u in which we apply UCF algorithm to generate his recommenda- tions. The cluster cM u is selected by identifying from wu, the cluster of the highest belonging weight. 45/67
  • 47. Principals of GA-DCLUS Secondary clusters set, which we denote by CS , and it consists of |U| sub-vector, as to have CS = {cS 1 , cS 2, ..cS |U|}. Each user u has a set cS u of nu secondary clusters denoted by cS u = {c1, c2, ..cnu }. User u can partic- ipate in the clusters of cS u as a candidate neighbor only, and does not get any recommendations. In order to improve the scalability, a mini- mum belonging threshold θ is added to the chromosome’ encoding. 46/67
  • 48. Clustering encoding scheme of GA-DCLUS ... ... CLUS NEI TH W1,1 W1,2 ... W1,|C| Clustering parameters U1 W2,1 W2,2 ... W2,|C| W|U|,1 W|U|,2 ... W|U|,|C| U2 U|U| Figure 24: The chromosome encoding scheme. 47/67
  • 49. Example of the clustering encoding scheme U1 U2 U3 U4 CLUS NEI TH W1,1 W1,2 W1,3 W2,1 W2,2 W2,3 W3,1 W3,2 W3,3 W4,1 W4,2 W4,3 2 48 0.8 0.95 0.85 0.7 0.55 0.92 0.88 0.90 0.81 0.91 0.7 0.94 0.74 Figure 25: Example of the proposed genetic encoding. 48/67
  • 50. Fitness function Pairwise users diversity: div(u1, u2) = 1 |I2 − I1| X i∈I2−I1 rarity(i), (9) Cluster’ diversity: cluster_content(c) = X (u1,u2)∈c α · sim(u1, u2) + (1 − α) · div(u1, u2), (10) Fitness function: fitness_function(ch) = β · (1 − coverage) + (1 − β) ·(γ · (1 − precision) + (1 − γ) · 1 |C| X c∈C cluster_content(c) (11) 49/67
  • 51. Improved similarity measure Prediction formula: Pr(u, i) = ¯ ru + P v∈N(u)sim(u, v) · R(v, i) P v∈N(u) sim(u, v) , (12) Improved similarity measure: new_sim(u1, u2) = w1,cM 1 + w2,cM 1 2 · sim(u1, u2) (13) 50/67
  • 52. Difference convergence speed using different similarity measrures 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Generations 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 Fitness function Weighted similarity Normal similarity Figure 26: Convergence speed of GA-DCLUS with normal and weighted similarity measure. 51/67
  • 53. Performance difference using different simiar- ity measures Configuration Precision Recall F1 Coverage Novelty GAsim 0.5560 0.6831 0.6130 0.4713 0.8165 GAwsim 0.5625 0.6830 0.6269 0.4733 0.8165 Table 6: Comparison of GA-DCLUS results using different similarity metrics. 52/67
  • 54. Improved prediction formula . . . . . . CLUS NEI TH W1,1 W1,2 . . . W1,|C| Clustering parameters U1 W2,1 W2,2 . . . W2,|C| W|U|,1 W|U|,2 . . . W|U|,|C| U2 U|U| WN 1 WN 2 . . . WN |U| Novelty weights Figure 27: The chromosome encoding scheme using novelty weights. The item’ novelty: Nov(i) = −Log( |Ui| |U| ) · Log(|U|), (14) The improved prediction formula: Primp(u, i) = reg · wN u · Nov(i) + Pr(u, i), (15) 53/67
  • 55. Steps of GA-DCLUS clustering algorithm 54/67
  • 56. Experimental configurations Mutation Pr Crossover Pr Population Selection St 0.02 0.9 50 binary tournament Table 7: GA configuration. reg maxN minN maxC minC α β γ [0-5] 50 10 50 4 0.2 0.9 0.1 Table 8: GA-DCLUS parameters. 55/67
  • 57. Relevancy performance of GA-DCLUS Precision Recall F1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Performance metrics Performance values GA(wsim,reg=0) GA(wsim,reg=1) GA(wsim,reg=2) GA(wsim,reg=3) GA(wsim,reg=4) GA(wsim,reg=5) Figure 28: GA-DCLUS performance with differenet configurations versus the relevancy metrics on 100k movielens dataset. 56/67
  • 58. Diversity performance of GA-DCLUS Precision Recall F1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Performance metrics Performance values GA(wsim,reg=0) GA(wsim,reg=5) KNN Kmeans PCA−GAKM Figure 29: GA-DCLUS comparison to other methods on base of relevancy metrics on 100k movielens dataset. 57/67
  • 59. Diversity comparison of GA-DCLUS to other methods Coverage Novelty 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Performance values Performance metrics GA(wsim,reg=0) GA(wsim,reg=5) KNN Kmeans PCA−GAKM Figure 30: GA-DCLUS comparision to other methods on base of diversity metrics on 100k movielens dataset. 58/67
  • 60. Results on the 1M MovieLens Algorithm Precision Recall F1 Coverage Novelty GAwsim,reg=0 0.6355 0.5889 0.6113 0.5349 0.8434 GAwsim,reg=5 0.6093 0.5784 0.5934 0.6307 0.8719 KNN 0.4582 0.3832 0.4173 0.2948 0.7808 Kmeans 0.5756 0.4683 0.5166 0.4608 0.7983 PCA − GAKM 0.6041 0.5184 0.5580 0.4794 0.8200 Table 9: Comparison of GA-DCLUS to other methods on 1M movielens dataset. 59/67
  • 61. Comparison to re-ranking methods MMR is a re-ranking algorithm that promotes diversity using a controlling paramter. We used the next objective function: i∗ = argmaxi∈(Ru−Su)(1 − λ) × rel(i, Su) + λ × gnov(i, Su), (16) Relevancy: rel(i, Su) = Pr(u, i) + P j∈Su Pr(u, j) 1 + |Su| , (17) Novelty: gnov(i, Su) = Nov(i) + P j∈Su Nov(j) 1 + |Su| , (18) 60/67
  • 62. Comparison to re-ranking methods Algorithm Precision Recall F1 Coverage Novelty GAwsim,reg=3 0.5490 0.6760 0.6059 0.5214 0.8310 Kmeans(MMR) 0.5248 0.6184 0.5696 0.4423 0.8040 PCA − GAKM(MMR) 0.5461 0.6648 0.5996 0.4840 0.8201 Table 10: Comparision of GA-DCLUS to MMR on 100k movielens dataset. 61/67
  • 63. Comparison to re-ranking methods Algorithm Precision Recall F1 Coverage Novelty GAwsim,reg=5 0.6093 0.5784 0.5934 0.6307 0.8719 Kmeans(MMR) 0.5585 0.4354 0.4893 0.4752 0.8038 PCA − GAKM(MMR) 0.5988 0.5372 0.5663 0.4727 0.8349 Table 11: Comparison of GA-DCLUS to MMR on 1M movielens dataset. 62/67
  • 65. General conclusions Genetic algorithms are good optimization tool which allow us to integrate more data sources, to hybrid more than a recommender, and to improve the similarity mea- sures. Recommendation quality can be driven by real indicators during the optimization, within a fitness function such as MAE or Precision. GA allows to handle more than one problem at once by adjusting the encoding of the solutions and the fitness function. The latter, can be parametrized using controilling parameters. 64/67
  • 66. Issues GAs take time ! many critical parameters, hard decoding, hard evaluation, GA’ behaviour is unpre- dictable...etc. The fitness function is hard to fix ! which features, which metrics, which formulas...etc. 65/67
  • 67. Future works Divide the clustering tasks in sub tasks ! map-reduce paradigm, parallelization, better fitness function, algorithms combina- tion..etc. Combine more data sources ! feature weighting, feature extraction,...etc. 66/67
  • 68. Thank you for your attention. 67/67