Identifying and mitigating bias in machine learning, by Ruta Binkyte

FAIR AI
Image by James Sutherland from Pixabay
Identifying and mitigating
bias in machine learning

2
Comete Team
Privacy Fairness Causality
Catuscia Palamidessi
Director of Comete team at
Inria Saclay-Ile-de-France,
Ecole Polytechnique
Sami Zhioua
Senior Researcher at Inria
Saclay-Ile-de-France, Ecole
Polytechnique
Ruta Binkyte
Ph.D. Student at Inria
Saclay-Ile-de-France, Ecole
Polytechnique, IP Paris

3
Table of Contents
01
02
03
Sources and types of bias
Bias in AI case studies
Bias mitigation
Photo by Maximalfocus on Unsplash

4
The Case Studies
Photo by ThisisEngineering RAEng on Unsplash

5
What happened?
The algorithms for face recognition have
much lower accuracy for darker female
faces.
Racial bias
in computer
vision Why?
Black women underrepresented in the
training data.

6
What happened?
The algorithm built to predict the need
for medical interventions (sickness)
would give lower score for black patients
who where the same or more sick than
the white ones.
Racial bias in
healthcare AI
Why?
The proxy used for “sickness” was
healthcare spending, which correlated
race.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.

7
Sources and
Types of Bias
Photo by Possessed Photography on Unsplash

8
The Learning Process
Raw Data
Features/
Attributes
The Prediction
The Decision
Feed Back Loop
Machine Learning Process

9
A group of is underrepresented
in a sample.
Types of Bias
Re p r e s e n t a t i o n B i a s
A historically disadvantaged group has
lower occurrence of positive labels
H i s t o r i c a l B i a s
Sample
population
https://www.britannica.com/topic/racial-segregation

10
When under-representation
can affect discrimination? <3%
Common for
Intersectional
Sensitive
attribute
Zhioua, S., & Binkytė, R. (2023). Shedding light on underrepresentation and Sampling Bias in machine learning. arXiv preprint arXiv:2306.05068.

11
Bias Mitigation
Photo by Aideal Hwa on Unsplash

12
Representation Bias:
Does data augmentation
always solve the problem?
“When representation bias is combined
with historical bias, data augmentation
can increase the disparity ”

13
When representation
bias is combined with
historical bias, the
oversampling can
increase the bias
Discrimination while augmenting
the training set with female group
samples randomly. The male group
size is
fi
xed at 100. Dataset is
Dutch Census and training
algorithm is logistic regression.
Discrimination while augmenting
the training set with only positive
outcome female group samples.
The male group size is
fi
xed at
100. Dataset is Dutch Census and
training algorithm is logistic
regression.
Random
augmentation vs.
Positive label
augmentation

14
Most bias mitigation algorithms aim to satisfy
Statistical Parity
P( ̂
Y = 1|S = 0) = P( ̂
Y = 1|S = 1)
Statistical Parity
Where is the prediction, S is the sensitive
attribute
̂
Y

15
Historical Bias:
Does “equal” always
mean “fair”?
“Black patients on average have 26.3%
more chronic diseases than the white”
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.

16
More nuanced fairness notions:
P( ̂
Y = 1|S = 0,E) = P( ̂
Y = 1|S = 1,E)
Conditional Statistical Parity
Where is the prediction, S is the sensitive attribute, E is
explanatory attribute and Y is the label
̂
Y
Equal Opportunity
P( ̂
Y = 1|S = 0,Y = 1) = P( ̂
Y = 1|S = 1,Y = 1)

17
When representation
bias is combined with
historical bias, the
oversampling can
increase the bias
BaBE:
Bayesian Bias
Elimination
Binkytė, R., Gorla, D., Palamidessi, C. (2023). BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables
a) Causal relation between the
variables , and the decision
based on .
S, E, Z
YZ Z
b) Derivation of . The
decision is then based directly
on .
E|S, Z
YE
E

18
Results for Disparate Impact Remover and BaBE
The original distributions P [E|S = 0] (green), P
[E|S = 1] (orange) and the distributions of the E
computed by Disparate Impact Remover (blue
and magenta) for S=0. The probability for higher
values for S=0 is increased minimally to match
the distribution of S=1
The original distributions P [E|S = 0] (green),
P [E|S = 1] (orange) and the distributions of
the E estimated by BaBE (blue and magenta)
for S=0. BaBE accurately matches the true
distributions both for S=0 and S=1
The distributions of E (green) and biased
Z (blue) for S = 0, and for S=1 E (orange),
biased Z (magenta).
S=1 S=0

19
Questions?
Suggestions?
Collaboration?
http://www.lix.polytechnique.fr/Labo/
Ruta.BINKYTE-SADAUSKIENE/
ruta.binkyte@gmail.com 0659274196
Please reach out!

20
THANK YOU
Thank You !
Photo by Aideal Hwa on Unsplash

Identifying and mitigating bias in machine learning, by Ruta Binkyte

Recommended

Recommended

More Related Content

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Recently uploaded

Recently uploaded (20)

Identifying and mitigating bias in machine learning, by Ruta Binkyte