This document discusses bias in artificial intelligence and algorithms. It begins with an introduction to the topic and why it is important. It then explores how to detect bias through various fairness metrics and how to mitigate bias through preprocessing, inprocessing, and postprocessing techniques. The document provides examples of different sources of bias and strategies to address them. It also recommends resources like the AI Fairness 360 toolkit to help evaluate models for fairness and identify potential biases.
11. @DataCrunch_Lab
@RTPAnalysts
“Scores like this — known as risk assessments — are increasingly common in courtrooms across the nation.
They are used to inform decisions about who can be set free at every stage of the criminal justice system, from
assigning bond amounts — as is the case in Fort Lauderdale — to even more fundamental decisions about
defendants’ freedom. In Arizona, Colorado, Delaware, Kentucky, Louisiana, Oklahoma, Virginia, Washington
and Wisconsin, the results of such assessments are given to judges during criminal sentencing.”
– Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner, ProPublica, May 23, 2016
(https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)
14. @DataCrunch_Lab
@RTPAnalysts
Analysis of COMPAS
Recidivism Score
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
15. @DataCrunch_Lab
@RTPAnalysts
Accurate for all groups
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
65.4% 63.8% 67.0%
Accuracy
“ The company said it had devised the algorithm to achieve this goal. A test
that is correct in equal proportions for all groups cannot be biased, the
company said.”
16. @DataCrunch_Lab
@RTPAnalysts
Black defendants wrongly
classified high risk more often
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
32.3% 44.8% 23.5%
Error rate
“Black defendants who do not recidivate were nearly twice as likely to be
classified by COMPAS as higher risk compared to their white counterparts.”
17. @DataCrunch_Lab
@RTPAnalysts
White defendants wrongly
classified low risk more often
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
37.4% 28.0% 47.7%
Error rate “The test tended to make the opposite mistake with whites, meaning
that it was more likely to wrongly predict that white people would not
commit additional crimes if released compared to black defendants.”
18. @DataCrunch_Lab
@RTPAnalysts
Biased or Unbiased?
Predictive Parity
65.4 63.8
67
0
10
20
30
40
50
60
70
80
ALL BLACK WHITE
Accuracy
Accuracy
Disparate Impact
32.3
44.8
23.5
37.4
28
47.7
0
10
20
30
40
50
60
70
80
ALL BLACK WHITE
Error rate
Negative Positive
19. @DataCrunch_Lab
@RTPAnalysts
“Since blacks are re-arrested more often than
whites, is it possible to create a formula that is
equally predictive for all races without
disparities in who suffers the harm of incorrect
predictions?”
NO
20. @DataCrunch_Lab
@RTPAnalysts
Key finding
“If you have two populations
that have unequal base rates
then you can’t satisfy both
definitions of fairness at the
same time.”
(CC BY 2.0) Photo by Ivan Radic
23. @DataCrunch_Lab
@RTPAnalysts
Project
Plan
• Human-centered approach
• Include multiple stakeholders
• Set fairness goals & metrics
• Audit the data for bias
• Consider potential sources of bias
• Consider primary, secondary, and
tertiary layers of effect
• Determine additional data collection
needs
• Plan mitigation strategies
• Plan monitoring strategies
Human
24. @DataCrunch_Lab
@RTPAnalysts
Bias sneaks
in…
“In 1952 the Boston
Symphony Orchestra
initiated blind auditions
to help diversify its male-
dominated roster, but
trials still skewed heavily
towards men.”
- Edd Gent, “Beware of the gaps
in Big Data,” Sept. 12, 2016
26. @DataCrunch_Lab
@RTPAnalysts
Skewed sample
Detect potholes to allocate repair crews
“The system reported a disproportionate
number of potholes in wealthier
neighbourhoods. It turned out it was
oversampling the younger, more affluent
citizens who were digitally clued up enough
to download and use the app in the first
place. ”
- David Wallace, “Big data has unconscious
bias too”, Sept. 21, 2016
27. @DataCrunch_Lab
@RTPAnalysts
Tainted examples
Identify job candidates
“That is because Amazon’s computer
models were trained to vet applicants by
observing patterns in resumes submitted to
the company over a 10-year period. Most
came from men, a reflection of male
dominance across the tech industry.”
- Jeffrey Dastin, “Amazon scraps secret AI
recruiting tool that showed bias against
women”, Oct. 9, 2018
28. @DataCrunch_Lab
@RTPAnalysts
Sample size disparity
“Assuming a fixed feature space, a
classifier generally improves with the
number of data points used to train it… The
contrapositive is that less data leads to
worse predictions. ”
- Moritz Hardt, “How big data is unfair”,
Sept. 26, 2014
29. @DataCrunch_Lab
@RTPAnalysts
Limited features
Diagnose heart disease
“recent research has shown that women
who have heart disease experience
different symptoms, causes and outcomes
than men.”
“Women are often told their stress tests are
normal or that they have "false positives."
Bairey Merz says doctors should pay
attention to symptoms such as chest pain
and shortness of breath rather than relying
on a stress test score.”
30. @DataCrunch_Lab
@RTPAnalysts
Proxies
Make college admissions decisions
“…certain rules in the system that weighed
applicants on the basis of seemingly non-
relevant factors, like place of birth and
name.
[…]simply having a non-European name
could automatically take 15 points off an
applicant’s score. The commission also
found that female applicants were docked
three points, on average.”
- Oscar Schwartz, “Untold History of AI:
Algorithmic Bias was Born in the 1980s”,
April 15, 2019
36. @DataCrunch_Lab
@RTPAnalysts
Fairness Metrics
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
Statistical Parity
Difference
41.2%
65.2%
0%
20%
40%
60%
80%
100%
BLACK WHITE
Statistical Parity Difference = -24.0%
LOW RISK HIGH RISK
Difference of the rate of favorable
outcomes between the unprivileged
and privileged groups
37. @DataCrunch_Lab
@RTPAnalysts
Fairness Metrics
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
Disparate Impact
Ratio of the rate of favorable outcomes
between the unprivileged and
privileged groups
41.2%
65.2%
0%
20%
40%
60%
80%
100%
BLACK WHITE
Disparate Impact = 63.2%
LOW RISK HIGH RISK
38. @DataCrunch_Lab
@RTPAnalysts
Fairness Metrics
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
Equal Opportunity
Difference
Difference in true positive rates between
the unprivileged and privileged groups
55.2%
76.5%
0%
20%
40%
60%
80%
100%
BLACK WHITE
Equal Opportunity Diff = -21.4%
LOW RISK HIGH RISK
39. @DataCrunch_Lab
@RTPAnalysts
Fairness Metrics
ALL DEFENDANTS BLACK DEFENDANTS WHITE DEFENDANTS
Low High Low High Low High
Survived 2681 1282 Survived 990 805 Survived 1139 349
Recidivated 1216 2035 Recidivated 532 1369 Recidivated 461 505
Average Odds
Difference
Average of difference in false positive
rate and true positive rates of favorable
outcomes between the unprivileged
and privileged groups
65.0% 71.2%
35.0% 28.8%
0%
20%
40%
60%
80%
100%
BLACK WHITE
(Abs) Average Odds Diff = 6.1%
SURVIVED RECIDIVATED
40. @DataCrunch_Lab
@RTPAnalysts
Mitigate Bias
Pre-processing
Mitigate bias in
training data
• Disparate Impact
Remover
• Learning Fair
Representations
• Optimized
Preprocessing
• Reweighing
In-processing
Mitigate bias in
classifiers
• Adversarial Debiasing
• ART Classifier
• Prejudice Remover
Post-processing
Mitigate bias in
predictions
• Calibrated Equality of
Odds
• Equality of Odds
• Reject Option
Classification
https://aif360.readthedocs.io/en/latest/modules/algorithms.html
41. @DataCrunch_Lab
@RTPAnalysts
“there is no
[machine learning]
model that works
best for every
problem”
Based on work by Wolpert (1996),
“The Lack of A Priori Distinctions
between Learning Algorithms”
Photo by Hermes Rivera on Unsplash
45. @DataCrunch_Lab
@RTPAnalystsBenefits of
post-
processing
@ LinkedIn
• Model agnostic
• Robust to application-
specific business logic
• Easier integration
• No significant
modifications to existing
components
• Complementary to pre-
processing and in-
processing efforts
https://engineering.linkedin.com/blog/2018/10/building-representative-talent-search-at-linkedin
53. @DataCrunch_Lab
@RTPAnalysts
Learn more about the
AI Fairness 360 toolkit
•GitHub: https://github.com/IBM/AIF360
•Docs: https://aif360.readthedocs.io/en/latest/
•Demo: http://aif360.mybluemix.net/
55. @DataCrunch_Lab
@RTPAnalysts
Other resources
•What-if Tool
Interactive visual interface designed to probe
your models better including AI Fairness
https://pair-code.github.io/what-if-tool/
•Ethics & Algorithms Toolkit
A risk management frameworks for government
(and other people too)
http://ethicstoolkit.ai/
57. Thank you!
Zeydy Ortiz, Ph. D.
CEO & Chief Data Scientist
zortiz @ datacrunchlab.com
www.linkedin.com/in/zortiz
58. Our Profile
DataCrunch Lab, LLC is an IT professional services and technology development
organization providing customized solutions that leverage data science, machine
learning, and artificial intelligence. Our solutions transform data into value
resulting in increased operational efficiency, reduced costs, and improved
outcomes.
Our Focus
We create value by uncovering needs, co-developing the best solution,
supporting customers through implementation, and ensuring the solution is
successful. As an IBM Business Partner, we leverage the Data & AI portfolio of
products in our solutions. We serve customers across many industries including IT,
legal, retail, financial, and industrial/manufacturing sectors.
Our Success
We have 20+ years of professional experience with the ability to architect, design
and develop highly complex IT solutions focused on business outcomes. Some of
the issues we have addressed include:
• Informed business strategy - what products to produce, what features to
include, and for which markets - with predictive modeling and simulation
• Increased visibility of real-time operations by integrating multiple technologies -
messaging, natural language understanding, chatbots, data visualization, cloud
computing
• Increased operational efficiency by predicting customer churn
• Reduced IT costs by predicting process duration to efficiently allocate resources
60. @DataCrunch_Lab
@RTPAnalysts
References
❖ “Machine Bias”, ProPublica,
❖ https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
❖ https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
❖ https://www.propublica.org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say
❖ “Beware of the gaps in Big Data,” https://eandt.theiet.org/content/articles/2016/09/beware-of-the-gaps-
in-big-data/
❖ “Big data has unconscious bias too”: https://thesumisgreater.wordpress.com/2016/09/21/big-data-has-
unconscious-bias-too/
❖ “Amazon scraps secret AI recruiting tool that showed bias against women”:
https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-
recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G
❖ “How big data is unfair,” https://medium.com/@mrtz/how-big-data-is-unfair-9aa544d739de
❖ “When it comes to heart disease, women and men are not equal” https://www.cedars-sinai.edu/About-
Us/HH-Landing-Pages/When-it-comes-to-heart-disease-women-and-men-are-not-equal.aspx
❖ “Untold History of AI: Algorithmic Bias was Born in the 1980s,” https://spectrum.ieee.org/tech-talk/tech-
history/dawn-of-electronics/untold-history-of-ai-the-birth-of-machine-bias
❖ AI Fairness 360 Toolkit: http://aif360.mybluemix.net
❖ “Certifying and removing disparate impact.” https://arxiv.org/abs/1412.3756
❖ “Mitigating Unwanted Biases with Adversarial Learning”, https://arxiv.org/pdf/1801.07593
❖ “On Fairness and Calibration,” https://arxiv.org/abs/1709.02012
❖ “Building Representative Talent Search at LinkedIn,”
https://engineering.linkedin.com/blog/2018/10/building-representative-talent-search-at-linkedin
❖ IBM Watson OpenScale, https://console.bluemix.net/catalog/services/ai-openscale