In the next slides, we present our approach to tackling the conflicting recommendation quality in recommender systems using a genetic-based clustering algorithm. In our approach, we studied the users' tendencies toward diversity and proposed a pairwise similarity measure to amount it. Later, we used the new similarity within a fitness function to create overlapped clusters and to recommend balanced recommendations in terms of diversity and relevancy.
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Study of relevancy, diversity, and novelty in recommender systems
1. Etude de la Diversité, la Nouveauté, et la Pertinence
dans les Systèmes de Recommandation
Chems Eddine BERBAGUE
Directeur de thèse : Pr. Hassina SERIDI
Co-directeur de thèse : Dr. Karabadji Nour El-Islem
Novembre, 2021
2. Table of Contents
1. Introduction
2. State of the Art
3. Contributions: Users clusterings and pairwise similairty.
4. Final Conclusion
1/67
5. Research questions
What makes the collaborative filtering a good research choice ?
Why is the clustering one of the best techniques for dealing with the
recommendation issues?
How well are the bio-inspired clusteirng techniques?
4/67
6. Aims and objectives
Improve the scalability of the memory-based collaborative filtering
algorithm.
Improve the recommendation quality.
5/67
7. Thesis research axes
Memory-based collaborative filtering algorithms
Neighbor 2
Neighbor 1
Neighbor 3
Target User
Similarity 3
Similarity 1
Similarity 2
Figure 2: General scheme of user-based collaborative filtering.
6/67
9. Thesis research axes
Recommendation quality improvment
Yes, we care about the quality !
Diversity
Relevancy
Noverlty
Figure 4: Different recommendation quality metrics.
8/67
15. How were evolutionary algorithms adapted to
RS ?
Inter-algorithmic use.
similarity calculation, recommendation ranking, clustering, latent factor models ..ect.
Intra-algorithmic use.
hybridization ..ect.
14/67
16. Bio-inspired algorithms
Algorithm GA ACO ANN ABC BAT FSS PSO
Similarity X X X - - - -
Weighting X - X - X X -
Clustering X - - X X X -
Re-ranking X - - - - - X
Latent factor models - - - - - - X
Graph-based models - X - X - - X
Table 3: Summary of different use of bio-inspired algorithms in RS.
15/67
17. Genetic-based multi-objective optimization
Genetic algorithms (GA).
GA is a bio-inspired machine learning tool for optimization. It consists of exploring a large search space
to select among the possible solutions, the most suitable/fit one.
Figure 7: General scheme of genetic-based multi-objectives optimization algorithm.
16/67
18. Questios about using GAs in RSs
Which problem to target ?
How to formally describe the problem ?
How to define the quality of the solution ?
17/67
24. The clustering is a solution !
Figure 10: Simple clustering example.
We propose the use of a genetic algorithm to:
• Catch the best clusters number.
• Explore the search space and select among its members the best clusters representatives.
• Controlling the borders overlap problem by specifying the clusters sizes.
23/67
25. Encoding scheme of GA-CLUS
k C1 !1 Cx !x Ck !k
… …
k is an integer to represent number of clusters
Ci is an integer to represent a profil as centre of the cluster i
!i is an integer to represent number of profiles in the cluster i
Figure 11: Encoding scheme of GA-CLUS
010 00000000000010 0100 00000000000111 1110
k 𝛼1
C1
𝛼2
C2
100 01010 0100 00101 0100 01111 0100 00001 0100
k 𝛼1
C1 C2 C3 C4
𝛼2 𝛼3 𝛼4
011 00000001 0100 00000101 1000 00001111 0110
k 𝛼1
C1 C2 C3
𝛼2 𝛼3
100 00001 0100 00100 0100 00101 0100 01000 0100
k 𝛼1
C1 C2 C3 C4
𝛼2 𝛼3 𝛼4
(a) (b)
(c) (d)
Figure 12: Some samples of GA-CLUS encoding scheme
24/67
26. Fitness function
Minimizing the MAE of each cluster:
group_precision(ch) = (max(r) − min(r)) − (
1
k
× (
k
X
i=1
MAE(Gi))) (1)
Diversifying the clusters’ ceters:
center_diversity(ch) =
1
k × (k − 1)
× (
k
X
i=1
k−1
X
j=i+1
(1 − sim(Ci, Cj+1))) (2)
Combination of centers’ diversity and clusters’ precision:
fitness(ch) = group_precision(ch) + center_diversity(ch) (3)
25/67
28. Rating prediction comparison of GA-CLUS in terms
of the neighborhood size
Figure 13: Comparison of GA-CLUS to
KNN.
Figure 14: Comparison of GA-CLUS to
Kmeans and PCA-GAKM.
27/67
29. Relevancy comparison of GA-CLUS in terms of
the neighborhood size
Figure 15: Precision comparison of
GA-CLUS to other methods.
Figure 16: Recall comparison of
GA-CLUS to other methods.
28/67
30. Relevancy comparison of GA-CLUS in terms of
Top-N recommendations
Figure 17: Precision comparison of
GA-CLUS with different recommendation
length.
Figure 18: Recall comparison of
GA-CLUS with different recommendation
length.
29/67
31. Partial conclusions
+ We encoded the clustering problem as to reduce the search
space.
+ We optimized the quality of the clustering in a way to achieve
more accurate results.
- Accuracy is an insufficient measure to evaluate the satisfaction of
users.
30/67
32. Discussion
Users’ similarity VS recommendation diversity.
Users
Low diversity
System
High popularity
Figure 19: Confilicting diversity of recommendation.
31/67
34. TS-IKNN:
First stage:
We sought to reduce the search space extension by performing an
adapted KNN algorithm. In this stage, we modified a similarity mea-
sure to combine a pairwise user diversity measure and a similarity-
based rating measure.
Second stage:
We employed a based genetic algorithm to improve the neighborhood
selection.
33/67
35. First stage:
Considering the diversity within a similarity measure may allow obtain-
ing a dual control on the users set to select similar and diverse ones at
the same time:
• The similarity definition:
new_sim(u1, u2) = α × sim(u1, u2) + (1 − α) × div(u1, u2), (4)
• The diversity definition:
div(u1, u2) =
X
i∈I2−I1
1 −
P(i)
|U|
, (5)
34/67
36. Second stage:
Figure 20: Chromosome encoding of the neighborhood optimization.
Combination of the diversity and the relevancy:
fitness(ch) = β × (1 − precision) + (1 − β) × (1 − diversity), (6)
35/67
38. Results of TS-IKNN
0 1 2 3 4 5 6 7
0
100
200
300
400
500
600
Algorithms
Coverage
KNN normal similarity
KNN modified similarity
Kmeans best Precision
Kmeans best Coverage
Proposed algorithm
LDA
Figure 21: Coverage comparision of TS-IKNN to other methods.
Normal similarity Adjusted similarity K-means best precision K-means best coverage Proposed approach LDA
Precision 0.6106 0.6130 0.6379 0.6321 0.6524 0.3368
Table 5: Precision comparision of TS-IKNN to other methods.
37/67
39. Partial conclusions
We presented an evolutionary algorithm that acts in two-stages with
the aim of making a balance between coverage and precision. How-
ever, this method suffers some limitations which we tried to improve
and solve:
• A neighborhood is assigned to a user according to binary weights,
which may exclude explorer users.
• The first neighbors’ candidates set size is hard to fix, whereas big
numbers allow a better exploration of possibilities. However, it
increases complexity.
• The correlation between the novelty of recommendations and
their coverage is not cleared up.
38/67
40. Perspectives and future work
A:
• Statics could misleading.
• The clustering alleviate the data dimensionality curse.
Q:
• How to improve the similarity calculation ?
• How to use clustering for targeting the novelty/diversity metrics ?
• Can we benefit from more information to improve the clustering ?
39/67
45. Principals of GA-DCLUS
We denote by U the set of users, and C the set of clusters. Our clus-
tering scheme consists of creating a matrix W = {w1, w2, ..w|U|}
of |U| rows and |C| columns. We define for each user u ∈ U, a be-
longing weigh vector wu ∈ W of |C| values, which we denote by
wu = {w(u,1), w(u,2), ..w(u,|C|)}, as to assign for u and a given cluster j ∈ C
a belonging weight value w(u,j). Furthermore, we assign each user u to
one main cluster, and zero or more seconding clusters in respect to
the belonging weight vector wu. Next elements explain how we pro-
ceeded to assign users.
44/67
46. Principals of GA-DCLUS
Main clusters set, which we denote by CM
, and it consists of |U| value,
as to have CM
= {cM
1 , cM
2 , ..cM
|U|}, whereas each user u has a main clus-
ter cM
u in which we apply UCF algorithm to generate his recommenda-
tions. The cluster cM
u is selected by identifying from wu, the cluster of
the highest belonging weight.
45/67
47. Principals of GA-DCLUS
Secondary clusters set, which we denote by CS
, and it consists of |U|
sub-vector, as to have CS
= {cS
1 , cS
2, ..cS
|U|}. Each user u has a set cS
u of
nu secondary clusters denoted by cS
u = {c1, c2, ..cnu }. User u can partic-
ipate in the clusters of cS
u as a candidate neighbor only, and does not
get any recommendations. In order to improve the scalability, a mini-
mum belonging threshold θ is added to the chromosome’ encoding.
46/67
48. Clustering encoding scheme of GA-DCLUS
... ...
CLUS NEI TH W1,1 W1,2 ... W1,|C|
Clustering
parameters
U1
W2,1 W2,2 ... W2,|C| W|U|,1 W|U|,2 ... W|U|,|C|
U2 U|U|
Figure 24: The chromosome encoding scheme.
47/67
49. Example of the clustering encoding scheme
U1 U2 U3 U4
CLUS NEI TH W1,1 W1,2 W1,3 W2,1 W2,2 W2,3 W3,1 W3,2 W3,3 W4,1 W4,2 W4,3
2 48 0.8 0.95 0.85 0.7 0.55 0.92 0.88 0.90 0.81 0.91 0.7 0.94 0.74
Figure 25: Example of the proposed genetic encoding.
48/67
65. General conclusions
Genetic algorithms are good optimization tool which allow us to integrate more data
sources, to hybrid more than a recommender, and to improve the similarity mea-
sures.
Recommendation quality can be driven by real indicators during the optimization,
within a fitness function such as MAE or Precision.
GA allows to handle more than one problem at once by adjusting the encoding of the
solutions and the fitness function. The latter, can be parametrized using controilling
parameters.
64/67
66. Issues
GAs take time !
many critical parameters, hard decoding, hard evaluation, GA’ behaviour is unpre-
dictable...etc.
The fitness function is hard to fix !
which features, which metrics, which formulas...etc.
65/67
67. Future works
Divide the clustering tasks in sub tasks !
map-reduce paradigm, parallelization, better fitness function, algorithms combina-
tion..etc.
Combine more data sources !
feature weighting, feature extraction,...etc.
66/67