Presentation for the paper published @ CIKM'22: 31st ACM International Conference on Information & Knowledge Management (October 17-21, 2022 | Atlanta, GA, USA)
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Do Graph Neural Networks Build Fair User Models? Assessing Disparate Impact and Mistreatment in Behavioural User Profiling
1. Do Graph Neural Networks
Build Fair User Models?
Assessing Disparate Impact and Mistreatment
in Behavioural User Profiling
Erasmo Purificato, Ludovico Boratto, and Ernesto William De Luca
DEFirst Reading Group
23rd February 2023
2.
3. 2011-2017
Master Degree in
Computer Engineering
University of Naples Federico II
2020-Today
Research Fellow
Leibniz Institute for Educational Media |
Georg Eckert Institute, Brunswick
2018-2019
2nd level Master Degree in
Artificial Intelligence
University of Turin
2021-Today
Adjunct Professor in
Software Engineering
University of Rome Guglielmo Marconi
About me
2017-2019
Technical Leader of
Cognitive Business Unit
Blue Reply, Turin
2020-Today
Research Assistant and PhD Student on
“ML Techniques for User Modeling
and User-Adaptive Interaction”
Otto von Guericke University Magdeburg
6. Background
User profiling
Infer an individual’s interests, personality traits
or behaviours from generated data to create an
efficient representation, i.e. a user model.
7. Background
Explicit user profiling
Early profiling approaches considered only the
analysis of static characteristics, with data
often coming from online forms and surveys.
Implicit user profiling
Modern systems focus on profiling users’ data
based on individuals’ actions and interactions
(also known as behavioural user profiling).
8. Background
Graphs are a natural way to model user behaviours.
Graph Neural Networks (GNNs) represent the perfect solution.
GNN
10. Motivation
As any ML system trained on historical data, GNNs are prone to learn
biases in such data and reveal them in their output.
11. ◎ Performed two user profiling tasks by executing a binary classification on
two real-world datasets with the most performing GNNs in this context;
◎ Assessed disparate impact and disparate mistreatment for GNNs designed
for behavioural user profiling, considering four algorithmic fairness metrics;
◎ From an extensive set of experiments, derived three observations about the
analysed models, correlating their different user profiling paradigms with the
fairness metrics scores to create a baseline for future assessment considering
GNN-based models for user profiling.
Our contributions
12. ◎ RQ1: How do the different input types of the analysed GNNs and the way the
user models are constructed affect fairness?
◎ RQ2: To what extent can the user models produced by the analysed state-of-
the-art GNNs be defined as fair?
◎ RQ3: Are disparate impact metrics enough to assess the fairness of GNN-based
models in behavioural user profiling tasks or is disparate mistreatment needed
to fully assess the presence of unfairness?
Research questions
13. Analysed models
CatGCN
Weijian Chen, Fuli Feng, Qifan Wang, Xiangnan He, Chonggang Song, Guohui Ling, and Yongdong Zhang. CatGCN: Graph
Convolutional Networks with Categorical Node Features. IEEE Trans on Knowledge and Data Engineering (2021)
14. Analysed models
RHGN
Qilong Yan, Yufeng Zhang, Qiang Liu, Shu Wu, and Liang Wang. Relation-aware Heterogeneous Graph for User Profiling.
In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM’21).
15. ◎ Statistical Parity (SP)
◎ Equal opportunity (EO)
Algorithmic fairness metrics adopted
Disparate impact
Refers to a form of indirect and often unintentional discrimination that occurs
when practices or systems seem to apparently treat people the same way.
∆𝑆𝑃= |𝑃 𝑦 = 1 𝑠 = 0 − 𝑃 𝑦 = 1 𝑠 = 1 |
∆𝐸𝑂= |𝑃 𝑦 = 1 𝑦 = 1, 𝑠 = 0 − 𝑃 𝑦 = 1 𝑦 = 1, 𝑠 = 1 |
◎ Overall accuracy equality (OAE)
∆𝑂𝐴𝐸 = |𝑃 𝑦 = 0 𝑦 = 0, 𝑠 = 0 + 𝑃 𝑦 = 1 𝑦 = 1, 𝑠 = 0 −
𝑃 𝑦 = 0 𝑦 = 0, 𝑠 = 1 − 𝑃 𝑦 = 1 𝑦 = 1, 𝑠 = 1 |
16. ◎ Treatment equality (TE)
Algorithmic fairness metrics adopted
Disparate mistreatment
Considers the misclassification rates for user groups having different values of
the sensitive attribute, instead of considering the corrected predictions.
∆𝑇𝐸=
𝑃 𝑦 = 1 𝑦 = 0, 𝑠 = 0
𝑃 𝑦 = 0 𝑦 = 1, 𝑠 = 0
−
𝑃 𝑦 = 1 𝑦 = 0, 𝑠 = 1
𝑃 𝑦 = 0 𝑦 = 1, 𝑠 = 1
17. Datasets
Alibaba contains click-through rates
data about ads displayed on Alibaba’s
Taobao platform, and has been adopted
in both CatGCN and RHGN original
papers for evaluation.
JD consists of users and items from the
retailer company of the same name
having click and purchase relationships,
already used in RHGN original paper.
18. ◎ For RQ1: Considered disparate impact metrics and compared their scores
between the two GNNs to measure models’ discrimination level and evaluate
the impact of the different user profiling paradigms on fairness.
◎ For RQ2: Contextualised the values of SP and EO with the results of FairGNN, a
recent framework focusing on predicting fair results for user profiling.
◎ For RQ3: Extended our fairness evaluation to consider TE and determine to
what extent the GNN models discriminate against users from the perspective of
disparate mistreatment, in comparison to disparate impact.
Experimental setting
19. Results and discussion
Observation 1. The ability of RHGN to
represent users through multiple
interaction modelling gains better
values in terms of fairness than a model
only relying on binary associations
between users and items, as CatGCN.
Observation 2. Even though RHGN
demonstrates to be a fairer model
than CatGCN, a debiasing process
is equally needed for both GNNs.
20. Results and discussion
Observation 3. In scenarios where the
correctness of a decision on the target label
w.r.t. the sensitive attributes is not well
defined, or where there is a high cost for
misclassified instances, a complete fairness
assessment should always take into account
disparate mistreatment evaluation.
21. ◎ Conducted, for the first time in literature, a fairness assessment for GNN-based
behavioural user profiling models.
◎ Our analysis on two state-of-the-art models and real-world datasets, covering
four algorithmic fairness metrics, showed that directly modelling raw user
interactions with a platform hurts a demographic group, who gets misclassified
more than its counterpart.
◎ In future work, we will explore other aspects of fairness analysis in this context,
e.g. multi-class and multi-group perspectives, as well as introduce some novel
procedures for bias mitigation.
Conclusion and future directions
22. Thanks!
I am Erasmo Purificato
You can find me at:
https://erasmopurif.com
@erasmopurif11
erasmo.purificato@ovgu.de
Editor's Notes
CatGCN is a Graph Convolutional Network (GCN) model tailored for graph learning on categorical node features. This model enhances the initial node representation by integrating two types of explicit interaction modelling into its learning process: a local multiplication-based interaction on each pair of node features and a global addition-based interaction on an artificial feature graph. The proposed method shows the effectiveness of performing feature interaction modelling before graph convolution.
RHGN is a Relation-aware Heterogeneous Graph Network designed to model multiple relations on a heterogeneous graph between different kinds of entities. The core parts of this model are a transformer-like multi-relation attention, used to learn the node importance and uncover the meta-relation significance on the graph, and a heterogeneous graph propagation network employed to gather information from multiple sources. This approach outperforms several GNN-based models on user profiling tasks.
Disparate impact concerns with situations where the model disproportionately discriminates against certain groups, even if the model does not explicitly employ the sensitive attribute to make predictions but rather on some proxy attributes. This is exactly what happens´in the analysed GNNs, where the user models are created by aggregating information from neighbours, and the sensitive attribute is not explicitly taken into consideration during classification. The notion of disparate impact is beneficial when there is not a clear linkage in training data between the predicted label and the sensi tive attribute, i.e. it is hard to define the validity of a decision for agroup member based on the historical data.
Statistical parity (or demographic parity) defines fairness as an equal probability for each group of being assigned to the positive class, i.e. predictions independent with sensitive attributes.
Equal opportunity requires the probability of a subject in a positive class to be classified with the positive outcome should be equal for each group.
Overall accuracy equality defines fairness as the equal probability of a subject from either positive or negative class to be assigned to its respective class, i.e. each group should have the same prediction accuracy
In a scenario where it is hard to define the correctness of a prediction related to sensitive attribute values, we argue that a complete fairness assessment should always include the perspective of disparate mistreatment.
Furthermore, the notion of disparate mistreatment is significant in contexts where misclassification costs depend on the group affected by the error.
Treatment equality requires the ratio of errors made by the classifier to be equal across different groups, i.e. each group should have the same ratio of false negatives (FN) and false positives (FP)
Here you can find my contacts. Thanks for your attention.