Don't blindly trust your ML System, it may change your life (Azzurra Ragone, Independent consultant)
1. Fairness in Machine Learning: are you
sure there is no bias in your
predictions?
Azzurra Ragone - Innovation and Diversity Advisor
Slides will be shared, follow @azzurraragone
2. Me…
Innovation and Diversity Advisor
Previous @Google DevRel team
Before Research fellow:
➢ Univ. Milano Bicocca,
➢ University of Michigan
➢ Politecnico of Bari
➢ University of Trento
3. People worry that computers will get too
smart and take over the world, but the
real problem is that they’re too stupid and
they’ve already taken over the world
The Master Algorithm
Pedro Domingos, 2015
4. How to make my ML system fair?
...and why care?
7. Arbitrary, inconsistent, or faulty decision-making thus
raises serious concerns because it risks limiting our
ability to achieve the goals that we have set for ourselves
and access the opportunities for which we are qualified.
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
8. How do we ensure that these decisions are
made the right way and for the right reasons?
Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
10. B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman.
Object Recognition by Scene Alignment.
Advances in Neural Information Processing Systems, 2007.
13. Generalizing from examples
Provide good examples:
- a sufficiently large and diverse set
- well annotated
Quick, Draw!
Source: https://design.google/library/fair-not-default/
14. Historical examples may reflect:
- Prejudices against a social group
- Cultural stereotypes
- Demographic inequalities
and finding patterns in these data means replicating these
same dynamics
16. 45% of ImageNet data comes from USA (4% of the world population)
3% of ImageNet data comes from China and India (36% of the world population)
Ref: Nature 559 and Shankar, S. et al. (2017)
Geo bias
19. Debiasing Word Embeddings
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Adv. Neural Inf. Proc. Syst. 2016, 4349–4357 (2016).
Credit: Pictures by Pixabay
20. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
Source: Fairness and Machine Learning
S. Barocas, M. Hardt, A. Narayanan
21. State of the world
Data
Measurement
The Machine Learning Loop
22. Provenance of data
is crucial.
Data cleaning is
mandatory.
The world is “messy”
Photo by pasja1000 on Pixabay
23. Measurement defines:
- your variables of interest,
- the process for turning your
observations into numbers,
- how you actually collect the
data
[Fairness and Machine Learning, 2018]
Photo by Iker Urteaga on Unsplash
24. The target variable is the
hardest to measure.
It is made up for the purpose
of the problem.
It is not a property that
people possess or lack
Ex. “creditworthiness”, “good
employee”, “attractiveness”
[Fairness and Machine Learning, 2018]
Photo by David Paschke on Unsplash
25. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
30. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
31. If you predict future prices (and publicizes them) you create a self-fulfilling
feedback loop: houses with a lower sales prices predicted deter buyers,
demand goes down and the final price is even lower
House price prediction
PhotobyDevaDarshanonUnsplash
32. Some communities may be disproportionately targeted, with people being
arrested for crimes that might be ignored in other communities.
Ref.: Saunders, J., Hunt, P. & Hollywood, J. S. J. Exp. Criminol. 12, 347–371 (2016).
Self-fulfilling predictions
PhotobyJacquesTiberionPixabay
33. “Feedback loops occur when data discovered on the
basis of predictions are used to update the model.”
Danielle Ensign et al.,
“Runaway Feedback Loops in Predictive Policing,” 2017
34. State of the world
Data
Individuals
Model
Measurement
Learning
Action Feedback
The Machine Learning Loop
35. Training data encode the demographic disparities in our society and
some stereotypes can be reinforced by ML (due to feedback loop)
The state of society
PhotobyCorySchadtonUnsplash
38. Analyze your data
Source: Google Machine Learning Crash Course
★ Are there missing feature values for a large number of observations?
★ Are there features that are missing that might affect other features?
★ Are there any unexpected feature values?
★ What signs of data skew do you see?
40. Skew data (geographical bias)
Source: California Housing dataset,
Google Machine Learning Crash Course
41. Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
Facets Overview, an
interactive
visualization tool to
explore datasets.
Quickly analyze the
distribution of
values across the
datasets.
42. Facets Overview
Source: Facet tool
(https://pair-code.github.io/facets/)
⅔ of examples
represent males,
while we would
expect the
breakdown
between
genders to be
closer to 50/50
43. Facets Dive
Source: Facet tool
(https://pair-code.github.io/facets/)
Data are faceted by
marital-status
feature. Male
outnumbers female
by more than 5:1.
Married women are
underrepresented in
our data.
44. “What-if” tool
Analyze ML model
without writing code.
Given pointers to a TF
model and a dataset,
the What-If Tool
offers an interactive
visual interface for
exploring model
results.
45. Counterfactuals
It is possible to
compare a datapoint
to the most similar
point where your
model predicts a
different result.
47. Edit a datapoint
Edit a datapoint and see
how your model performs.
Edit, add or remove
features or feature values
for any selected datapoint
and then run inference to
test model performance.
48. ★ Measurement is crucial
★ Know your data (and how data were collected and annotated)
★ Try to discover hidden biases (missing values, data skew, subgroups, etc.)
★ Ask questions. Don’t train the model and then walk away
★ Avoid feedback loop
★ Use tools that allow you to do such investigation
Key Takeaways
50. ❏ AI can be sexist and racist — it’s time to make it fair James Zou &
Londa Schiebinger - Nature 559, 324-326 (2018)
❏ The Master Algorithm Pedro Domingos, 2015
❏ Fairness and Machine Learning S. Barocas, M. Hardt, A. Narayanan
❏ No Classification without Representation: Assessing Geodiversity
Issues in Open Data Sets for the Developing World Shreya Shankar,
Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley
❏ Man is to computer programmer as woman is to homemaker?
Debiasing word embeddings T. Bolukbasi, K.-W. Chang, J. Y. Zou, V.
Saligrama, A. T. Kalai,. Adv. Neural Inf. Process. Syst. 2016,
4349–4357 (2016)
References
51. ❏ There is a blind spot in AI research, Kate Crawford & Ryan Calo, Nature
538, 311–313 (20 October 2016)
❏ Semantics Derived Automatically from Language Corpora Contain
Human-Like Biases, Aylin Caliskan, Joanna J. Bryson, and Arvind
Narayanan, Science 356, no. 6334 (2017): 183–86
❏ Predictions Put Into Practice: a Quasi-experimental Evaluation of
Chicago's Predictive Policing Pilot Saunders, J., Hunt, P. & Hollywood,
J. S. J. Exp. Criminol. 12, 347–371 (2016).
❏ Runaway Feedback Loops in Predictive Policing Danielle Ensign et al.
arXiv:1706.09847
References
52. ❏ Object Recognition by Scene Alignment. B. C. Russell, A. Torralba, C.
Liu, R. Fergus, W. T. Freeman. Advances in Neural Information
Processing Systems, 2007.
❏ Fair Is Not the Default (https://design.google/library/fair-not-default/)
❏ “Playing with fairness” - David Weinberger.
❏ Google Machine Learning Crash Course
❏ What-if tool: https://pair-code.github.io/what-if-tool/
❏ Facet tool https://pair-code.github.io/facets/
References
54. ★ Group unaware: disregard the gender mix of the applicants, exclude gender and
gender-proxy information from the data set
★ Group thresholds: adjust the confidence thresholds for different group
independently
★ Demographic parity: The composition of the set should reflect the percentage of
applicants
★ Equal opportunity: Individuals who qualify for a desirable outcome should have an
equal chance of being correctly classified for this outcome (=true positive)
★ Equal accuracy: the system ought to be tuned so that the percentage of times it's
wrong in the total of approvals and denials is the same for both groups (=false
positive+false negative)
Types of Fairness
“Playing with fairness”
by David Weinberger.
55. Computer scientist Arvind Narayanan gave a talk:
“21 fairness definitions and their politics”
Watch it on Youtube!