Identifying and mitigating bias in machine learning, by Ruta Binkyte
1. FAIR AI
Image by James Sutherland from Pixabay
Identifying and mitigating
bias in machine learning
2. 2
Comete Team
Privacy Fairness Causality
Catuscia Palamidessi
Director of Comete team at
Inria Saclay-Ile-de-France,
Ecole Polytechnique
Sami Zhioua
Senior Researcher at Inria
Saclay-Ile-de-France, Ecole
Polytechnique
Ruta Binkyte
Ph.D. Student at Inria
Saclay-Ile-de-France, Ecole
Polytechnique, IP Paris
5. 5
What happened?
The algorithms for face recognition have
much lower accuracy for darker female
faces.
Racial bias
in computer
vision Why?
Black women underrepresented in the
training data.
6. 6
What happened?
The algorithm built to predict the need
for medical interventions (sickness)
would give lower score for black patients
who where the same or more sick than
the white ones.
Racial bias in
healthcare AI
Why?
The proxy used for “sickness” was
healthcare spending, which correlated
race.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
8. 8
The Learning Process
Raw Data
Features/
Attributes
The Prediction
The Decision
Feed Back Loop
Machine Learning Process
9. 9
A group of is underrepresented
in a sample.
Types of Bias
Re p r e s e n t a t i o n B i a s
A historically disadvantaged group has
lower occurrence of positive labels
H i s t o r i c a l B i a s
Sample
population
https://www.britannica.com/topic/racial-segregation
10. 10
When under-representation
can affect discrimination? <3%
Common for
Intersectional
Sensitive
attribute
Zhioua, S., & Binkytė, R. (2023). Shedding light on underrepresentation and Sampling Bias in machine learning. arXiv preprint arXiv:2306.05068.
12. 12
Representation Bias:
Does data augmentation
always solve the problem?
“When representation bias is combined
with historical bias, data augmentation
can increase the disparity ”
Zhioua, S., & Binkytė, R. (2023). Shedding light on underrepresentation and Sampling Bias in machine learning. arXiv preprint arXiv:2306.05068.
13. 13
When representation
bias is combined with
historical bias, the
oversampling can
increase the bias
Zhioua, S., & Binkytė, R. (2023). Shedding light on underrepresentation and Sampling Bias in machine learning. arXiv preprint arXiv:2306.05068.
Discrimination while augmenting
the training set with female group
samples randomly. The male group
size is
fi
xed at 100. Dataset is
Dutch Census and training
algorithm is logistic regression.
Discrimination while augmenting
the training set with only positive
outcome female group samples.
The male group size is
fi
xed at
100. Dataset is Dutch Census and
training algorithm is logistic
regression.
Random
augmentation vs.
Positive label
augmentation
14. 14
Most bias mitigation algorithms aim to satisfy
Statistical Parity
Photo by Possessed Photography on Unsplash
P( ̂
Y = 1|S = 0) = P( ̂
Y = 1|S = 1)
Statistical Parity
Where is the prediction, S is the sensitive
attribute
̂
Y
15. 15
Historical Bias:
Does “equal” always
mean “fair”?
“Black patients on average have 26.3%
more chronic diseases than the white”
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
16. 16
More nuanced fairness notions:
P( ̂
Y = 1|S = 0,E) = P( ̂
Y = 1|S = 1,E)
Conditional Statistical Parity
Where is the prediction, S is the sensitive attribute, E is
explanatory attribute and Y is the label
̂
Y
Equal Opportunity
P( ̂
Y = 1|S = 0,Y = 1) = P( ̂
Y = 1|S = 1,Y = 1)
Photo by Possessed Photography on Unsplash
17. 17
When representation
bias is combined with
historical bias, the
oversampling can
increase the bias
BaBE:
Bayesian Bias
Elimination
Binkytė, R., Gorla, D., Palamidessi, C. (2023). BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables
a) Causal relation between the
variables , and the decision
based on .
S, E, Z
YZ Z
b) Derivation of . The
decision is then based directly
on .
E|S, Z
YE
E
18. 18
Results for Disparate Impact Remover and BaBE
The original distributions P [E|S = 0] (green), P
[E|S = 1] (orange) and the distributions of the E
computed by Disparate Impact Remover (blue
and magenta) for S=0. The probability for higher
values for S=0 is increased minimally to match
the distribution of S=1
The original distributions P [E|S = 0] (green),
P [E|S = 1] (orange) and the distributions of
the E estimated by BaBE (blue and magenta)
for S=0. BaBE accurately matches the true
distributions both for S=0 and S=1
The distributions of E (green) and biased
Z (blue) for S = 0, and for S=1 E (orange),
biased Z (magenta).
S=1 S=0