Don't blindly trust your ML System, it may change your life (Azzurra Ragone, Independent consultant)

Fairness in Machine Learning: are you
sure there is no bias in your
predictions?
Azzurra Ragone - Innovation and Diversity Advisor
Slides will be shared, follow @azzurraragone

Me…
Innovation and Diversity Advisor
Previous @Google DevRel team
Before Research fellow:
➢ Univ. Milano Bicocca,
➢ University of Michigan
➢ Politecnico of Bari
➢ University of Trento

People worry that computers will get too
smart and take over the world, but the
real problem is that they’re too stupid and
they’ve already taken over the world
The Master Algorithm
Pedro Domingos, 2015

How to make my ML system fair?
...and why care?

Our success, happiness and
wellbeing can be affected by other
decisions

Life-changing decisions:
➔ Admission to schools
➔ Job offers
➔ Patients screenings
➔ Mortgage grant
➔ ...

Arbitrary, inconsistent, or faulty decision-making thus
raises serious concerns because it risks limiting our
ability to achieve the goals that we have set for ourselves
and access the opportunities for which we are qualiﬁed.
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan

How do we ensure that these decisions are
made the right way and for the right reasons?
Fairness and Machine Learning

The ML promise:
make decisions more consistent,
accurate and rigorous.

B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman.
Object Recognition by Scene Alignment.
Advances in Neural Information Processing Systems, 2007.

...but there are serious risks in learning
from examples.

Generalizing from examples
Source: https://design.google/library/fair-not-default/
Quick, Draw!

Generalizing from examples
Provide good examples:
- a suﬃciently large and diverse set
- well annotated
Quick, Draw!
Source: https://design.google/library/fair-not-default/

Historical examples may reﬂect:
- Prejudices against a social group
- Cultural stereotypes
- Demographic inequalities
and ﬁnding patterns in these data means replicating these
same dynamics

Source: https://gluon-cv.mxnet.io/build/examples_datasets/imagenet.html

45% of ImageNet data comes from USA (4% of the world population)
3% of ImageNet data comes from China and India (36% of the world population)
Ref: Nature 559 and Shankar, S. et al. (2017)
Geo bias

Photo Credit: Left: iStock/Getty; Right: Prakash Singh/AFP/Getty (from Nature 559, 324-326 (2018))
Bride
Dress
Woman
Wedding
Performance
art
Costume

Debiasing Word Embeddings
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Adv. Neural Inf. Proc. Syst. 2016, 4349–4357 (2016).
Credit: Pictures by Pixabay

State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
Source: Fairness and Machine Learning

State of the world
Data
Measurement

Provenance of data
is crucial.
Data cleaning is
mandatory.
The world is “messy”
Photo by pasja1000 on Pixabay

Measurement deﬁnes:
- your variables of interest,
- the process for turning your
observations into numbers,
- how you actually collect the
data
[Fairness and Machine Learning, 2018]
Photo by Iker Urteaga on Unsplash

The target variable is the
hardest to measure.
It is made up for the purpose
of the problem.
It is not a property that
people possess or lack
Ex. “creditworthiness”, “good
employee”, “attractiveness”
[Fairness and Machine Learning, 2018]
Photo by David Paschke on Unsplash

State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback

ML will extract
stereotypes the same
way that it extracts
knowledge

ML works better with more data, so it will work less well for
members of minority groups
Sample size disparity
Training set
Training data

Predictions - actions - outcome
Photo by Pixabay

If you predict future prices (and publicizes them) you create a self-fulﬁlling
feedback loop: houses with a lower sales prices predicted deter buyers,
demand goes down and the ﬁnal price is even lower
House price prediction
PhotobyDevaDarshanonUnsplash

Some communities may be disproportionately targeted, with people being
arrested for crimes that might be ignored in other communities.
Ref.: Saunders, J., Hunt, P. & Hollywood, J. S. J. Exp. Criminol. 12, 347–371 (2016).
Self-fulﬁlling predictions
PhotobyJacquesTiberionPixabay

“Feedback loops occur when data discovered on the
basis of predictions are used to update the model.”
Danielle Ensign et al.,
“Runaway Feedback Loops in Predictive Policing,” 2017

Training data encode the demographic disparities in our society and
some stereotypes can be reinforced by ML (due to feedback loop)
The state of society
PhotobyCorySchadtonUnsplash

Analyze your data
Source: Google Machine Learning Crash Course
★ Are there missing feature values for a large number of observations?
★ Are there features that are missing that might affect other features?
★ Are there any unexpected feature values?
★ What signs of data skew do you see?

Missing feature values
Source: California Housing dataset,
Google Machine Learning Crash Course

Skew data (geographical bias)
Source: California Housing dataset,
Google Machine Learning Crash Course

Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
Facets Overview, an
interactive
visualization tool to
explore datasets.
Quickly analyze the
distribution of
values across the
datasets.

Facets Overview
Source: Facet tool
⅔ of examples
represent males,
while we would
expect the
breakdown
between
genders to be
closer to 50/50

Facets Dive
Source: Facet tool
Data are faceted by
marital-status
feature. Male
outnumbers female
by more than 5:1.
Married women are
underrepresented in
our data.

“What-if” tool
Analyze ML model
without writing code.
Given pointers to a TF
model and a dataset,
the What-If Tool
offers an interactive
visual interface for
exploring model
results.

Counterfactuals
It is possible to
compare a datapoint
to the most similar
point where your
model predicts a
different result.

Counterfactuals
a minor difference in
age and an
occupation change
ﬂipped the model’s
prediction (earning
>50K)

Edit a datapoint
Edit a datapoint and see
how your model performs.
Edit, add or remove
features or feature values
for any selected datapoint
and then run inference to
test model performance.

★ Measurement is crucial
★ Know your data (and how data were collected and annotated)
★ Try to discover hidden biases (missing values, data skew, subgroups, etc.)
★ Ask questions. Don’t train the model and then walk away
★ Avoid feedback loop
★ Use tools that allow you to do such investigation
Key Takeaways

❏ AI can be sexist and racist — it’s time to make it fair James Zou &
Londa Schiebinger - Nature 559, 324-326 (2018)
❏ The Master Algorithm Pedro Domingos, 2015
❏ Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
❏ No Classiﬁcation without Representation: Assessing Geodiversity
Issues in Open Data Sets for the Developing World Shreya Shankar,
Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley
❏ Man is to computer programmer as woman is to homemaker?
Debiasing word embeddings T. Bolukbasi, K.-W. Chang, J. Y. Zou, V.
Saligrama, A. T. Kalai,. Adv. Neural Inf. Process. Syst. 2016,
4349–4357 (2016)
References

❏ There is a blind spot in AI research, Kate Crawford & Ryan Calo, Nature
538, 311–313 (20 October 2016)
❏ Semantics Derived Automatically from Language Corpora Contain
Human-Like Biases, Aylin Caliskan, Joanna J. Bryson, and Arvind
Narayanan, Science 356, no. 6334 (2017): 183–86
❏ Predictions Put Into Practice: a Quasi-experimental Evaluation of
Chicago's Predictive Policing Pilot Saunders, J., Hunt, P. & Hollywood,
J. S. J. Exp. Criminol. 12, 347–371 (2016).
❏ Runaway Feedback Loops in Predictive Policing Danielle Ensign et al.
arXiv:1706.09847
References

❏ Object Recognition by Scene Alignment. B. C. Russell, A. Torralba, C.
Liu, R. Fergus, W. T. Freeman. Advances in Neural Information
Processing Systems, 2007.
❏ Fair Is Not the Default (https://design.google/library/fair-not-default/)
❏ “Playing with fairness” - David Weinberger.
❏ Google Machine Learning Crash Course
❏ What-if tool: https://pair-code.github.io/what-if-tool/
❏ Facet tool https://pair-code.github.io/facets/
References

★ Group unaware: disregard the gender mix of the applicants, exclude gender and
gender-proxy information from the data set
★ Group thresholds: adjust the confidence thresholds for different group
independently
★ Demographic parity: The composition of the set should reflect the percentage of
applicants
★ Equal opportunity: Individuals who qualify for a desirable outcome should have an
equal chance of being correctly classified for this outcome (=true positive)
★ Equal accuracy: the system ought to be tuned so that the percentage of times it's
wrong in the total of approvals and denials is the same for both groups (=false
positive+false negative)
Types of Fairness
“Playing with fairness”
by David Weinberger.

Computer scientist Arvind Narayanan gave a talk:
“21 fairness deﬁnitions and their politics”
Watch it on Youtube!

★ Reporting bias (ex. book reviews)
★ Automation bias
★ Selection bias (ex. phone survey):
○ Coverage bias
○ Non-response bias
○ Sampling bias
★ Group attribution bias (ex. university)
★ Implicit bias (ex. Conﬁrmation bias)
Types of Bias

Don't blindly trust your ML System, it may change your life (Azzurra Ragone, Independent consultant)

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Don't blindly trust your ML System, it may change your life (Azzurra Ragone, Independent consultant)

Similar to Don't blindly trust your ML System, it may change your life (Azzurra Ragone, Independent consultant) (20)

More from Data Driven Innovation

More from Data Driven Innovation (20)

Recently uploaded

Recently uploaded (20)

Don't blindly trust your ML System, it may change your life (Azzurra Ragone, Independent consultant)