Fairness in Machine Learning

Fairness in Machine Learning
Delip Rao

Every ML practitioners dream scenario
Well defined eval objective
Lots of clean data
Rich data with lot of attributes

Incorporating Ethnicity improves Engagement
Metrics
But should you do it?

Goldstein, the “computer expert”

Dramatic Changes in Machine Learning
Landscape

Rise of fast/cheap data collection, processing

Rise of popular, easy-to-use tools

Rise of Data Scientist Factories

Two Questions
Should everything that can be predicted, be predicted?
If you really have to predict, what should you be aware of?

“Catalog of Evils”
Dwork et al 2011

Sc
S
protected
class
population

Blatant Explicit Discrimination
Feature4231:Race=’Black’

Discrimination Based on Redundant Encoding
Feature4231:Race=’Black’
Features = {‘loc’, ‘income’, ..}
Polynomial kernel with degree 2
Feature6578:Loc=’EastOakland’^Income=’<10k’

Big Data
There is no
data like more
data.

Big Data
Classifier error rate
Number of training
examples in your data

Most ML Objective functions create
models accurate for the majority class at
the expense of the protected class

Cultural differences can throw a wrench in your
models

Look at Error Cases vs. Error Rates
Macros - Accuracy, RMSE, F1, etc
vs.
Individuals

Becoming Responsible Gatekeepers

We are pretty good at learning
function approximations today

Image Credit: Jason Eisner, the three cultures of ML, 2016
NNs &
Decision Trees

Learning methods that introduce fairness
Ways to characterize fairness
Need

How can we characterize fairness?
What does fairness even mean?
Group Fairness vs. Individual Fairness

One way to characterize group fairness is to ensure both majority and the
protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1

One way to characterize group fairness is to ensure both majority and the
protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
often this is hard to achieve.
For example, for jobs, the EEOC specifies this ratio should be no less than 0.8 : 1
(aka 80% rule).

Characterizing Fairness of a black box classifier
One way: Is classifier outcome correlated with membership in
S?

Fairness as a constraint
Is classifier outcome correlated with membership in S?
Sensitive attributes
Decision function
Want

Fairness as a constraint
Constraint to be added:

Supervised Learning with Fairness Constraint
minimize
such that
Zafar et al, ICML 2015

“If we allowed a model to be
used for college admissions in
1870, we’d still have 0.7% of
women going to college.”
Recommended Reading

Reading List
There’s much material on fairness in data-driven decision/policy making from
literature in
- law
- sociology
- political science
- computer science/machine learning
- economics
(the machine learning literature is nascent, only around 2009 onwards)

Reading List (Fairness in ML)
Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. "Measuring Discrimination in Socially-Sensitive
Decision Records." SDM. 2009.
Kamiran, Faisal, and Toon Calders. "Classifying without discriminating."Computer, Control and
Communication, 2009. IC4 2009. 2nd International Conference on. IEEE, 2009.
Dwork, Cynthia, et al. "Fairness through awareness." Proceedings of the 3rd Innovations in
Theoretical Computer Science Conference. ACM, 2012
Romei, Andrea, and Salvatore Ruggieri. "A multidisciplinary survey on discrimination analysis."
The Knowledge Engineering Review 29.05 (2014)

Reading List (Fairness in ML)
Friedler, Sorelle, Carlos Scheidegger, and Suresh Venkatasubramanian. "Certifying and removing
disparate impact." CoRR (2014).
Barocas, Solon and Selbst, Andrew D., Big Data's Disparate Impact (August 14, 2015). California
Law Review, Vol. 104,
Zafar, Muhammad Bilal, et al. "Fairness Constraints: A Mechanism for Fair Classification." arXiv preprint
arXiv:1507.05259 (2015).
Zliobaite, Indre. "On the relation between accuracy and fairness in binary classification." arXiv preprint
arXiv:1505.05723 (2015).

Other resources
NSF’s “Big Data Innovation Hubs” were created in part to address these
challenges
http://www.nsf.gov/news/news_summ.jsp?cntn_id=136784
Stanford Law Review touches upon this topic regularly
http://www.stanfordlawreview.org/online/privacy-and-big-data
Fairness blog
http://fairness.haverford.edu
Academic: FATML workshops (NIPS 2014, ICML 2015)
www.fatml.org

Lessons
Discrimination is an emergent property of any learning algorithm
Watch out for discrimination (implicitly) encoded in features
Big Data can cause Big Problems
Watch out for the proportion of the protected classes
Always do error analysis with protected classes in mind
Notions of fairness are nascent at best. Involve as many people to improve
understanding.
There is no one best notion of fairness

questions
@deliprao / delip@joostware.com

Fairness in Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fairness in Machine Learning

Similar to Fairness in Machine Learning (20)

Recently uploaded

Recently uploaded (20)

Fairness in Machine Learning

Editor's Notes