@cataldomusto cataldo.musto@uniba.it
Fairness and Popularity Bias
in Recommender Systems:
an Empirical Evaluation
CATALDO MUSTO, PASQUALE LOPS, GIOVANNI SEMERARO
UNIVERSITÀ DEGLI STUDI DI BARI ‘ALDO MORO’ - ITALY
Recommender Systems
Technology able to push
relevant items (movies, news,
books, etc.) to the users
based on their preferences.
2
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
How to Evaluate a RecSys?
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Recommender Systems are
typically evaluated by using
accuracy metrics (Precision,
Recall, NDCG, HitRate, etc.)
Is there anything else we can
evaluate?
3
Popular Non-Accuracy Metrics
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Novelty Serendipity Fairness
4
Popular Non-Accuracy Metrics
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Novelty Serendipity Fairness
5
Fairness in AI
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
By definition, fairness refers to the
non-discrimination of certain groups
based on particular protected
attributes (gender, race, etc.)
Fair Al algorithms do not suffer of
these bias (or implement particular
strategies to mitigate them)
6
Fairness in RecSys
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
As regards RecSys, the concept of fairness
is multi-sided
Items Fairness: if all the items have equal
probability of being recommended
Users Fairness: if the recommendation list
reflects the actual preferences of the users
based on a target features (e.g., genre)
7
Problem: Popularity Bias
8
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Fairness of RecSys (and quality of recommendations) is
strongly affected by popularity bias
What does it mean?
Users tend to express preferences on popular
items, and this make popular items to be
recommended more frequently w.r.t. niche
ones [*].
[*] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher,
The unfairness of popularity biasin recommendation, arXiv
preprint arXiv:1907.13286 (2019).
Contribution
9
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
The goal of our study is to evaluate the
fairness-by-design of the most popular
recommendation paradigms.
Recommendation Paradigms
Collaborative
Filtering
Content-based
RecSys
Graph-based
RecSys
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 10
Collaborative
Filtering
Paradigms: Collaborative Filtering
Exploits the preferences of the
community to generate
recommendations.
Intuition: to suggest items liked
by users similar to the target
one
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 11
Paradigms: Collaborative Filtering
target
user
neighbor
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 12
Paradigms: Collaborative Filtering
target
user
neighbor
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 13
Paradigms: Collaborative Filtering
target
user Problems: sparsity (cold start)
Neighbor ?
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 14
Neighbor ?
Neighbor ?
Neighbor ?
Paradigms: Collabor. Filtering (advanced)
15
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Paradigms: Content-based RecSys
Exploit descriptive features
of the items (e.g. genre of a
book, director of a movie) to
generate recommendations.
Insight: to suggest items
similar to those the user
already liked
16
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Paradigms: Content-based RecSys
user profile items
Recommendations
are generated by
matching the
features stored in
the user profile
with those
describing the
items to be
recommended.
17
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Paradigms: (Recent) Content RecSys
18
Reecent methods exploit word
embedding techniques, which are based
on the distributional hypothesis
beer
wine
glass
spoon
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
19
Matrix Revolutions
Donnie Darko
Grandi Magazzini
The Matrix
Paradigms: (Recent) Content RecSys
Reecent methods exploit word
embedding techniques, which are based
on the distributional hypothesis
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Paradigms: Graph-based RecSys
Exploit a graph-based data
model connecting users,
items and descriptive
properties of the items (e.g.,
gathered from a knowledge
graph)
Insight: recommendation
seen as a path ranking
problem
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 20
Typically, the
relevance score for all
the item nodes is
calculated and the
top-N are returned as
recommendations
PageRank
Spreading Activation
Personalized PR
…
Paradigms: Graph-based RecSys
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 21
Research Questions
22
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Which recommendation paradigm is more
prone by design to popularity bias?
2. Which recommendation paradigm
generates more fair recommendation
lists?
Experimental Protocol
23
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithm
Top-10
Recommmendations
Evaluation Metrics
Ratings
Content
Data are typically split into
training and test
Experimental Evaluation
24
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Two datasets
GoodBooks MovieLens1M
Users 53,424 6,040
Items 10,000 3,883
Ratings 6,000,000 1,000,209
%Positive 68.97% 57.91%
Sparsity 99.82% 96.42%
Content Plot Plot
Experimental Evaluation
25
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
GoodBooks MovieLens1M
Users 53,424 6,040
Items 10,000 3,883
Ratings 6,000,000 1,000,209
%Positive 68.97% 57.91%
Sparsity 99.82% 96.42%
Content Plot Plot
Two datasets
Experimental Evaluation
26
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Collaborative Filtering
User-to-User CF
Item-to-Item CF
FunkSVD
Biased Matrix Factorization
Experimental Evaluation
27
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Content-based RecSys
Vector Space Model +
TF/IDF
Word2Vec
Doc2Vec
Latent Semantic Indexing
Experimental Evaluation
28
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Graph-based RecSys
PageRank
Personalized PageRank
Experimental Evaluation
29
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Baselines
Random (upper bound)
Popularity-based (lower bound)
Evaluation Metrics
30
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Items Fairness: Catalogue Coverage
Items Fairness: Gini Index
Percentage of items in the catalogue recommended to at least one
users (the higher, the better)
Measures how unbalanced (in terms of frequency) is the
distribution of the recommendations to all the users. Values in
range [0,1], the lower the better. Gini Index close to 1 means that
just a few items appear in the recommendation lists.
Evaluation Metrics
31
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Users Fairness: ΔGAP
Gap between the average popularity of the items in the profile w.r.t. the average popularity
of the items in the recommendation list. A fair recommendation list reflects the average
popularity of the items in the profile.
Users are first split in groups (block-buster, diverse, niche users) and the average
popularity of each group is calculated
Evaluation Metrics
32
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Users Fairness: ΔGAP
Gap between the average popularity of the items in the profile w.r.t. the average popularity
of the items in the recommendation list. A fair recommendation list reflects the average
popularity of the items in the profile.
Users are first split in groups (block-buster, diverse, niche users) and the average
popularity of each group is calculated
ΔGAP=0 fair recommendation list
ΔGAP<0 under-estimation of the popularity
ΔGAP>0 over-estimation of the popularity
Results
33
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Catalogue Coverage
MovieLens 1M Goodbooks
Baselines Random 3,688 94.98% 10,000 100.00%
Popular 67 1.63% 243 2.43%
Collaborative
Filtering
U2U-CF 296 7.62% 2,210 22.10%
I2I-CF 471 12.13% 2,819 28.19%
Biased-MF 547 14.09% 1,830 18.30%
Content-based
Recsys
VSM/TF-IDF 444 11.43% 2,622 26.22%
Word2Vec 492 12.67% 3,081 30.81%
Doc2Vec 476 12.26% 2,987 29.87%
Graph-based
RecSys
PR 15 0.38% 11 0.11%
PPR 36 0.92% 22 0.22%
Results
34
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Catalogue Coverage
MovieLens 1M Goodbooks
Baselines Random 3,688 94.98% 10,000 100.00%
Popular 67 1.63% 243 2.43%
Collaborative
Filtering
U2U-CF 296 7.62% 2,210 22.10%
I2I-CF 471 12.13% 2,819 28.19%
Biased-MF 547 14.09% 1,830 18.30%
Content-based
Recsys
VSM/TF-IDF 444 11.43% 2,622 26.22%
Word2Vec 492 12.67% 3,081 30.81%
Doc2Vec 476 12.26% 2,987 29.87%
Graph-based
RecSys
PR 15 0.38% 11 0.11%
PPR 36 0.92% 22 0.22%
Results
35
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Catalogue Coverage
MovieLens 1M Goodbooks
Baselines Random 3,688 94.98% 10,000 100.00%
Popular 67 1.63% 243 2.43%
Collaborative
Filtering
U2U-CF 296 7.62% 2,210 22.10%
I2I-CF 471 12.13% 2,819 28.19%
Biased-MF 547 14.09% 1,830 18.30%
Content-based
Recsys
VSM/TF-IDF 444 11.43% 2,622 26.22%
Word2Vec 492 12.67% 3,081 30.81%
Doc2Vec 476 12.26% 2,987 29.87%
Graph-based
RecSys
PR 15 0.38% 11 0.11%
PPR 36 0.92% 22 0.22%
Take-Home Messages
36
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Collaborative Filtering algorithms cover a larger
number of items when the dataset is less sparse.
2. When the sparsity is higher, content-based
recommendations are best option
3. Graph-based recommender systems behave worse
than popularity-based recsys
• All the algorithms are very prone to popularity bias (Gini
Index values very close to 1)
• Recommendations algorithms tipically under-estimate
the average popularity of the recommendations
• DGap are close to zero. Good fairness for the users, on
average
Results
37
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Gini Index
MovieLens1M GoodBooks
Baselines Random 0.185 0.334
Popular 0.995 0.998
Collaborative Filtering U2U-CF 0.989 0.973
I2I-CF 0.990 0.986
Biased-MF 0.984 0.987
Content-based Recsys VSM/TF-IDF 0.990 0.961
Word2Vec 0.985 0.956
Doc2Vec 0.987 0.959
Graph-based RecSys PR 0.996 0.999
PPR 0.995 0.998
Results
38
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Gini Index
MovieLens1M GoodBooks
Baselines Random 0.185 0.334
Popular 0.995 0.998
Collaborative Filtering U2U-CF 0.989 0.973
I2I-CF 0.990 0.986
Biased-MF 0.984 0.987
Content-based Recsys VSM/TF-IDF 0.990 0.961
Word2Vec 0.985 0.956
Doc2Vec 0.987 0.959
Graph-based RecSys PR 0.996 0.999
PPR 0.995 0.998
Take-Home Messages
39
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Collaborative Filtering algorithms cover a larger number
of items when the dataset is less sparse.
2. When the sparsity is higher, content-based
recommendations are best option
3. Graph-based recommender systems behave worse than
popularity-based recsys
4. All the algorithms are very prone to popularity bias (Gini
Index values very close to 1)
• Recommendations algorithms tipically under-estimate
the average popularity of the recommendations
• DGap are close to zero. Good fairness for the users, on
average
Results
40
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – MovieLens Data
Results
41
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – MovieLens Data
Results
42
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – GoodBooks Data
Results
43
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – GoodBooks Data
Take-Home Messages
44
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Collaborative Filtering algorithms cover a larger number
of items when the dataset is less sparse.
2. When the sparsity is higher, content-based
recommendations are best option
3. Graph-based recommender systems behave worse than
popularity-based recsys
4. All the algorithms are very prone to popularity bias (Gini
Index values very close to 1)
5. Recommendations algorithms tipically under-estimate
the average popularity of the recommendations
6. ΔGap is close to zero. Good fairness for the users, on
average
Conclusions
45
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
This paper provides a benchmark of the fairness-by-design of some
popular recommendation paradigms
Conclusions
46
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
This paper provides a benchmark of the fairness-by-design of some
popular recommendation paradigms
• Results showed that algorithms are strongly affected by popularity bias.
• Item coverage is not satisfying, recommendation lists are not fair (as
confirmed by Gini Index). Mitigation strategies are needed.
• ΔGAP values are more satisfying when more advanced techniques are
exploited
• Future Work. Evaluation of hybrid recommendation models and more
recent content-based recommendation techniques. Introduction of other
non-accuracy metrics (novelty, serendipity, etc.)
Thank you! cataldo.musto@uniba.it
@cataldomusto
Contacts
47
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021

Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation

  • 1.
    @cataldomusto cataldo.musto@uniba.it Fairness andPopularity Bias in Recommender Systems: an Empirical Evaluation CATALDO MUSTO, PASQUALE LOPS, GIOVANNI SEMERARO UNIVERSITÀ DEGLI STUDI DI BARI ‘ALDO MORO’ - ITALY
  • 2.
    Recommender Systems Technology ableto push relevant items (movies, news, books, etc.) to the users based on their preferences. 2 Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
  • 3.
    How to Evaluatea RecSys? Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Recommender Systems are typically evaluated by using accuracy metrics (Precision, Recall, NDCG, HitRate, etc.) Is there anything else we can evaluate? 3
  • 4.
    Popular Non-Accuracy Metrics CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Novelty Serendipity Fairness 4
  • 5.
    Popular Non-Accuracy Metrics CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Novelty Serendipity Fairness 5
  • 6.
    Fairness in AI CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 By definition, fairness refers to the non-discrimination of certain groups based on particular protected attributes (gender, race, etc.) Fair Al algorithms do not suffer of these bias (or implement particular strategies to mitigate them) 6
  • 7.
    Fairness in RecSys CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 As regards RecSys, the concept of fairness is multi-sided Items Fairness: if all the items have equal probability of being recommended Users Fairness: if the recommendation list reflects the actual preferences of the users based on a target features (e.g., genre) 7
  • 8.
    Problem: Popularity Bias 8 CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Fairness of RecSys (and quality of recommendations) is strongly affected by popularity bias What does it mean? Users tend to express preferences on popular items, and this make popular items to be recommended more frequently w.r.t. niche ones [*]. [*] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity biasin recommendation, arXiv preprint arXiv:1907.13286 (2019).
  • 9.
    Contribution 9 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 The goal of our study is to evaluate the fairness-by-design of the most popular recommendation paradigms.
  • 10.
    Recommendation Paradigms Collaborative Filtering Content-based RecSys Graph-based RecSys Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 10
  • 11.
    Collaborative Filtering Paradigms: Collaborative Filtering Exploitsthe preferences of the community to generate recommendations. Intuition: to suggest items liked by users similar to the target one Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 11
  • 12.
    Paradigms: Collaborative Filtering target user neighbor CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 12
  • 13.
    Paradigms: Collaborative Filtering target user neighbor CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 13
  • 14.
    Paradigms: Collaborative Filtering target userProblems: sparsity (cold start) Neighbor ? Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 14 Neighbor ? Neighbor ? Neighbor ?
  • 15.
    Paradigms: Collabor. Filtering(advanced) 15 Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
  • 16.
    Paradigms: Content-based RecSys Exploitdescriptive features of the items (e.g. genre of a book, director of a movie) to generate recommendations. Insight: to suggest items similar to those the user already liked 16 Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
  • 17.
    Paradigms: Content-based RecSys userprofile items Recommendations are generated by matching the features stored in the user profile with those describing the items to be recommended. 17 Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
  • 18.
    Paradigms: (Recent) ContentRecSys 18 Reecent methods exploit word embedding techniques, which are based on the distributional hypothesis beer wine glass spoon Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
  • 19.
    19 Matrix Revolutions Donnie Darko GrandiMagazzini The Matrix Paradigms: (Recent) Content RecSys Reecent methods exploit word embedding techniques, which are based on the distributional hypothesis Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
  • 20.
    Paradigms: Graph-based RecSys Exploita graph-based data model connecting users, items and descriptive properties of the items (e.g., gathered from a knowledge graph) Insight: recommendation seen as a path ranking problem Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 20
  • 21.
    Typically, the relevance scorefor all the item nodes is calculated and the top-N are returned as recommendations PageRank Spreading Activation Personalized PR … Paradigms: Graph-based RecSys Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 21
  • 22.
    Research Questions 22 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 1. Which recommendation paradigm is more prone by design to popularity bias? 2. Which recommendation paradigm generates more fair recommendation lists?
  • 23.
    Experimental Protocol 23 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Algorithm Top-10 Recommmendations Evaluation Metrics Ratings Content Data are typically split into training and test
  • 24.
    Experimental Evaluation 24 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Two datasets GoodBooks MovieLens1M Users 53,424 6,040 Items 10,000 3,883 Ratings 6,000,000 1,000,209 %Positive 68.97% 57.91% Sparsity 99.82% 96.42% Content Plot Plot
  • 25.
    Experimental Evaluation 25 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 GoodBooks MovieLens1M Users 53,424 6,040 Items 10,000 3,883 Ratings 6,000,000 1,000,209 %Positive 68.97% 57.91% Sparsity 99.82% 96.42% Content Plot Plot Two datasets
  • 26.
    Experimental Evaluation 26 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Algorithms: Collaborative Filtering User-to-User CF Item-to-Item CF FunkSVD Biased Matrix Factorization
  • 27.
    Experimental Evaluation 27 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Algorithms: Content-based RecSys Vector Space Model + TF/IDF Word2Vec Doc2Vec Latent Semantic Indexing
  • 28.
    Experimental Evaluation 28 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Algorithms: Graph-based RecSys PageRank Personalized PageRank
  • 29.
    Experimental Evaluation 29 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Algorithms: Baselines Random (upper bound) Popularity-based (lower bound)
  • 30.
    Evaluation Metrics 30 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Items Fairness: Catalogue Coverage Items Fairness: Gini Index Percentage of items in the catalogue recommended to at least one users (the higher, the better) Measures how unbalanced (in terms of frequency) is the distribution of the recommendations to all the users. Values in range [0,1], the lower the better. Gini Index close to 1 means that just a few items appear in the recommendation lists.
  • 31.
    Evaluation Metrics 31 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Users Fairness: ΔGAP Gap between the average popularity of the items in the profile w.r.t. the average popularity of the items in the recommendation list. A fair recommendation list reflects the average popularity of the items in the profile. Users are first split in groups (block-buster, diverse, niche users) and the average popularity of each group is calculated
  • 32.
    Evaluation Metrics 32 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Users Fairness: ΔGAP Gap between the average popularity of the items in the profile w.r.t. the average popularity of the items in the recommendation list. A fair recommendation list reflects the average popularity of the items in the profile. Users are first split in groups (block-buster, diverse, niche users) and the average popularity of each group is calculated ΔGAP=0 fair recommendation list ΔGAP<0 under-estimation of the popularity ΔGAP>0 over-estimation of the popularity
  • 33.
    Results 33 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Catalogue Coverage MovieLens 1M Goodbooks Baselines Random 3,688 94.98% 10,000 100.00% Popular 67 1.63% 243 2.43% Collaborative Filtering U2U-CF 296 7.62% 2,210 22.10% I2I-CF 471 12.13% 2,819 28.19% Biased-MF 547 14.09% 1,830 18.30% Content-based Recsys VSM/TF-IDF 444 11.43% 2,622 26.22% Word2Vec 492 12.67% 3,081 30.81% Doc2Vec 476 12.26% 2,987 29.87% Graph-based RecSys PR 15 0.38% 11 0.11% PPR 36 0.92% 22 0.22%
  • 34.
    Results 34 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Catalogue Coverage MovieLens 1M Goodbooks Baselines Random 3,688 94.98% 10,000 100.00% Popular 67 1.63% 243 2.43% Collaborative Filtering U2U-CF 296 7.62% 2,210 22.10% I2I-CF 471 12.13% 2,819 28.19% Biased-MF 547 14.09% 1,830 18.30% Content-based Recsys VSM/TF-IDF 444 11.43% 2,622 26.22% Word2Vec 492 12.67% 3,081 30.81% Doc2Vec 476 12.26% 2,987 29.87% Graph-based RecSys PR 15 0.38% 11 0.11% PPR 36 0.92% 22 0.22%
  • 35.
    Results 35 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Catalogue Coverage MovieLens 1M Goodbooks Baselines Random 3,688 94.98% 10,000 100.00% Popular 67 1.63% 243 2.43% Collaborative Filtering U2U-CF 296 7.62% 2,210 22.10% I2I-CF 471 12.13% 2,819 28.19% Biased-MF 547 14.09% 1,830 18.30% Content-based Recsys VSM/TF-IDF 444 11.43% 2,622 26.22% Word2Vec 492 12.67% 3,081 30.81% Doc2Vec 476 12.26% 2,987 29.87% Graph-based RecSys PR 15 0.38% 11 0.11% PPR 36 0.92% 22 0.22%
  • 36.
    Take-Home Messages 36 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 1. Collaborative Filtering algorithms cover a larger number of items when the dataset is less sparse. 2. When the sparsity is higher, content-based recommendations are best option 3. Graph-based recommender systems behave worse than popularity-based recsys • All the algorithms are very prone to popularity bias (Gini Index values very close to 1) • Recommendations algorithms tipically under-estimate the average popularity of the recommendations • DGap are close to zero. Good fairness for the users, on average
  • 37.
    Results 37 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Gini Index MovieLens1M GoodBooks Baselines Random 0.185 0.334 Popular 0.995 0.998 Collaborative Filtering U2U-CF 0.989 0.973 I2I-CF 0.990 0.986 Biased-MF 0.984 0.987 Content-based Recsys VSM/TF-IDF 0.990 0.961 Word2Vec 0.985 0.956 Doc2Vec 0.987 0.959 Graph-based RecSys PR 0.996 0.999 PPR 0.995 0.998
  • 38.
    Results 38 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 Gini Index MovieLens1M GoodBooks Baselines Random 0.185 0.334 Popular 0.995 0.998 Collaborative Filtering U2U-CF 0.989 0.973 I2I-CF 0.990 0.986 Biased-MF 0.984 0.987 Content-based Recsys VSM/TF-IDF 0.990 0.961 Word2Vec 0.985 0.956 Doc2Vec 0.987 0.959 Graph-based RecSys PR 0.996 0.999 PPR 0.995 0.998
  • 39.
    Take-Home Messages 39 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 1. Collaborative Filtering algorithms cover a larger number of items when the dataset is less sparse. 2. When the sparsity is higher, content-based recommendations are best option 3. Graph-based recommender systems behave worse than popularity-based recsys 4. All the algorithms are very prone to popularity bias (Gini Index values very close to 1) • Recommendations algorithms tipically under-estimate the average popularity of the recommendations • DGap are close to zero. Good fairness for the users, on average
  • 40.
    Results 40 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 ΔGAP – MovieLens Data
  • 41.
    Results 41 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 ΔGAP – MovieLens Data
  • 42.
    Results 42 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 ΔGAP – GoodBooks Data
  • 43.
    Results 43 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 ΔGAP – GoodBooks Data
  • 44.
    Take-Home Messages 44 Cataldo Musto,Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 1. Collaborative Filtering algorithms cover a larger number of items when the dataset is less sparse. 2. When the sparsity is higher, content-based recommendations are best option 3. Graph-based recommender systems behave worse than popularity-based recsys 4. All the algorithms are very prone to popularity bias (Gini Index values very close to 1) 5. Recommendations algorithms tipically under-estimate the average popularity of the recommendations 6. ΔGap is close to zero. Good fairness for the users, on average
  • 45.
    Conclusions 45 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 This paper provides a benchmark of the fairness-by-design of some popular recommendation paradigms
  • 46.
    Conclusions 46 Cataldo Musto, PasqualeLops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 This paper provides a benchmark of the fairness-by-design of some popular recommendation paradigms • Results showed that algorithms are strongly affected by popularity bias. • Item coverage is not satisfying, recommendation lists are not fair (as confirmed by Gini Index). Mitigation strategies are needed. • ΔGAP values are more satisfying when more advanced techniques are exploited • Future Work. Evaluation of hybrid recommendation models and more recent content-based recommendation techniques. Introduction of other non-accuracy metrics (novelty, serendipity, etc.)
  • 47.
    Thank you! cataldo.musto@uniba.it @cataldomusto Contacts 47 CataldoMusto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation. AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021