Fairness in AI:
How to Identify and
Fix Discrimination in
Machine Learning
NICHOLAS SCHMIDT
AI PRACTICE LEADER - BLDS, LLC
PRESENTATION TO MARYLAND AI GROUP
JUNE 3, 2019
Introduction
u Problems in Fairness, Accountability, and Transparency in AI May
Hinder Its Adoption and Lead to the Next “AI Winter”
u These Issues are Present in All Statistical Models – and are Potentially
More Problematic in Traditional Statistics - but Have Received
Renewed Scrutiny in AI
u This Talk Focuses on Fairness, which Necessarily Touches on
Accountability and Transparency
u What is fairness?
u Why can AI be unfair or discriminatory?
u What can we do to ensure, or at least mitigate, unfairness in AI?
2
Fairness in AI
u Why Does Fairness Matter?
u Regardless of one’s definition of fairness, everyone wants to be treated fairly
u Ensuring fairness is a moral and ethical imperative
u Other Reasons to Care about Fairness:
u I hate that I have to even say this, but…
u You will have a better model
u An unfair model gets predictions wrong
u A fair model allows expansion into new markets
u You may be able to avoid regulatory scrutiny and protect your reputation
u The public is skeptical of AI: the burden is on you
u Having your CEO testify before a senate subcommittee is not a good career move
3
Examples of Unfair and
Discriminatory Models
u Hiring algorithms
u Image Recognition
u Recidivism Risk (?)
u But Automated Systems May Decrease Bias Over Time
u Discrimination is irrational to a utility maximizing firm
u Automating processes removes subjective decision making
4
What Does it Mean for AI to be
Fair?
u Imperfect Statistical Models (and Decisions, generally) are Inherently
Inequitable, but are Not Necessarily Systematically Unfair or
Discriminatory
u Conceptually, Fairness is “The quality of treating people equally or in
a way that is reasonable.” (Oxford English Dictionary)
u This is clearly deeply subjective
u There are numerous mathematical definitions of fairness – some of
them are contradictory
u Anti-classification
u Classification parity
u Calibration
u “21 Definitions of Fairness”
These issues are well-outlined in a recent paper by Sam Corbett-Davis and Sharad Goel, “The Measure and
Mismeasure of Fairness: A Critical Review of Fair Machine Learning.” https://arxiv.org/pdf/1808.00023.pdf
“21 Definitions of Fairness”, 2018 FAT/ML Conference. https://www.youtube.com/watch?v=jIXIuYdnyyk
5
When Might AI be Unfair or
Discriminatory?
u Inaccurate or Insufficient Data
u Variables Influence Outcomes Differently by Group (and are under-
represented in model development)
u Not Fully Causally Related Variables that are Correlated to the
Protected Class
u Less Discriminatory Models are Available
6
Discrimination as Unfairness
u Legal Protection for Protected Classes
u Typically includes race, color, religion, sex, disability, familial status, national
origin, and age – depending on the statute (either Federal, State, or local)
u Equal Credit Opportunity Act
u Title VII of the Civil Rights Act of 1964
u Age Discrimination in Employment Act of 1967
u Employment, credit, and housing are primary targets of regulations
u Types of Discrimination
u Overt Discrimination
u Disparate Treatment
u Adverse Impact or Disparate Impact – Demographic Parity
7
Definitions of Fairness:
Anti-Classification
u One May Not Use Protected Class Status when Making a Decision
(i.e., Building a Statistical Model)
u This corresponds to “disparate treatment” on the prior slide
u Use of protected class status does not have to be explicit: proxies for
class status are sufficient
u Ensuring Anti-Classification is a Legal Requirement in Housing,
Employment, Credit
u Anti-Classification May Cause Discrimination
8
Definitions of Fairness:
Classification Parity
u Ensuring that Common Measures of Predictive Performance are
Equal Across Classes
u Equalizing false positive and false negative rates
u Differential Validity
u Examples of Dangers when Using Classification Parity:
u Predatory lending
u Increased crime in affected communities
9
Definitions of Fairness:
Calibration
u Model Outcomes, Conditioned on Risk, are Independent of
Protected Class Status
u A Calibrated Model Will Predict that:
u Protected class members who have an estimated X% chance of
defaulting truly default at a X% rate
u At the same time, control class members who have the same estimated
X% chance of defaulting truly default at the same X% rate
10
Contradictions in Fairness:
Calibration and Classification Parity
The original article that pointed out this issue is by Jon Kleinberg, et al., “Inherent Trade-Offs in the Fair
Determination of Risk Scores,” https://arxiv.org/pdf/1609.05807.pdf
Total - Counts
Description Prediction 1 Prediction 2 Total
N 2,000 2,000 4,000
Model Score 70% 25% 47.5%
Calibrated Scores:
Number of True Defaults 1,400 500 1,900
Number of Predicted Defaults 1,400 500 1,900
Lender Decision - Reject Individuals with a Score >= 60.0%:
Total Acceptances 0 2,000 2,000
Total Rejections 2,000 0 2,000
Total - Confusion Matrix
Predicted True Outcome
Outcome Defaults Paid-In-Full Total
Default 1,400 600 2,000
Paid -In-Full 500 1,500 2,000
Total 1,900 2,100 4,000
Predicted True Outcome
Outcome Defaults Paid-In-Full
Default 74% 29%
Paid -In-Full 26% 71%
Accuracy 72.5% Percent of everyone who is correctly identified.
Precision 70.0% Percent of estimated defaults who truly default.
Recall 73.7% Percent of true defaults who are estimated to default.
F1 Score 71.8% Harmonic mean of precision and recall.
11
Contradictions in Fairness:
Calibration and Classification Parity
Non-Hispanic Whites - Counts
Description Prediction 1 Prediction 2 Total
N 900 1,100 2,000
Model Score 70% 25% 45.3%
Calibrated Scores:
True Defaults 630 275 905
Predicted Defaults 630 275 905
Lender Decision - Reject Individuals with a Score >= 60.0%:
Total Acceptances 0 1,100 1,100
Total Rejections 900 0 900
African-Americans - Counts
Description Prediction 1 Prediction 2 Total
N 1,100 900 2,000
Model Score 70% 25% 49.8%
Calibrated Scores:
True Defaults 770 225 995
Predicted Defaults 770 225 995
Lender Decision - Reject Individuals with a Score >= 60.0%:
Total Acceptances 0 900 900
Total Rejections 1,100 0 1,100
Non-Hispanic Whites - Confusion Matrix
Predicted True Outcome
Outcome Defaults Paid-In-Full Total
Default 630 270 900
Paid -In-Full 275 825 1,100
Total 905 1,095 2,000
Predicted True Outcome
Outcome Defaults Paid-In-Full
Default 70% 25%
Paid -In-Full 30% 75%
African-Americans - Confusion Matrix
Predicted True Outcome
Outcome Defaults Paid-In-Full Total
Default 770 330 1,100
Paid -In-Full 225 675 900
Total 995 1,005 2,000
Predicted True Outcome
Outcome Defaults Paid-In-Full
Default 77% 33%
Paid -In-Full 23% 67%
Accuracy 72.8%
Precision 70.0%
Recall 69.6%
F1 Score 69.8%
Accuracy 72.3%
Precision 70.0%
Recall 77.4%
F1 Score 73.5%
12
Disparities in AI Models and the
Problem of Proxies
u A common concern with AI models is that they may create
“proxies” for protected class status, where the complexity of the
model leads to class membership being used to make decisions in a
way that cannot easily be found and ameliorated.
u If the variables used in the model have a strong relationship with
class status, spurious correlation or poor model building could lead
to a proxy problem.
u Measuring within-class disparities - i.e., differences in treatment that
only occurs for some members of a class, is much harder
13
Popular Measures of Disparity
u When Measuring Disparate Impact, We Typically Use:
𝐴𝐼𝑅 =
% 𝑃𝑟𝑜𝑡𝑒𝑐𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝑆𝑒𝑙𝑒𝑐𝑡𝑒𝑑
% 𝐶𝑜𝑛𝑡𝑟𝑜𝑙 𝐶𝑙𝑎𝑠𝑠 𝑆𝑒𝑙𝑒𝑐𝑡𝑒𝑑
𝑆𝑀𝐷 = 100 ∗
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑆𝑐𝑜𝑟𝑒:;<=>?=>@ ABCDD − 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑆𝑐𝑜𝑟𝑒A<F=;<B ABCDD
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛:<HIBC=J<F
14
Less Discriminatory AI Models
u Less Discriminatory Modeling: Focus on Model Fairness Over Social
Fairness
u The same principles used to find viable less discriminatory models in
traditional statistics can be used in AI models. Viable models are:
u Similarly predictive
u Have lower adverse impact
u Are found through a “reasonable search”
u Searching for Alternatives in AI may Take Extra Time. However,
u The “Multiplicity of Good Models” means that AI models are more likely to
yield viable models
u Advances in computing that have allowed AI to flourish also allow a more
thorough search for alternatives
15
Less Discriminatory AI Models
Methods
u Less Discriminatory Models May be Found Through:
u Alternative feature selection
u Adversarial modeling
u Algorithm selection
u Hyperparameter searches
u Regularization
u Data preprocessing
u Open Source Fairness Programs
u IBM’s AI Fairness 360
u Try These, but Make Sure You Hire a Lawyer (and Maybe a
Consultant)
16
Less Discriminatory AI Models
Alternative Feature Selection
u Advances in AI and alternative data mean that there may be
thousands of possible features used, which makes feature selection
very difficult
u Smart searches make the problem more tractable. These include
the use of:
u Variable groupings, which diminishes the number of potential feature
combinations to be considered
u Statistical methods of ex ante determination of likely drivers of quality
and impact
u A drop-and-replace method that allows for faster optimization
17
Less Discriminatory AI Models
Alternative Feature Selection
18
Less Discriminatory AI Models
Adversarial NNs
Source: https://blog.godatadriven.com/fairness-in-pytorch
19
Less Discriminatory AI Models
Other Methods
u Algorithm selection
u Hyperparameter searches
u Data preprocessing
u Regularization
20
Resources for Learning More about
Fairness in AI
u ACM FAT* Conference Proceedings
u Patrick Hall’s Excellent Github Repository
u https://github.com/jphall663/awesome-machine-learning-interpretability
u IBM AI Fairness 360
u http://aif360.mybluemix.net/
u A Few Papers
u https://arxiv.org/pdf/1609.05807.pdf
u https://arxiv.org/pdf/1811.10154.pdf
u http://www.aies-conference.com/wp-
content/papers/main/AIES_2018_paper_162.pdf
u https://arxiv.org/pdf/1802.04422.pdf
21
Nicholas Schmidt
BLDS, LLC
nschmidt@bldsllc.com
(312) 802-0017
https://www.linkedin.com/in/nickpschmidt/
22

Fairness in Machine Learning and AI

  • 1.
    Fairness in AI: Howto Identify and Fix Discrimination in Machine Learning NICHOLAS SCHMIDT AI PRACTICE LEADER - BLDS, LLC PRESENTATION TO MARYLAND AI GROUP JUNE 3, 2019
  • 2.
    Introduction u Problems inFairness, Accountability, and Transparency in AI May Hinder Its Adoption and Lead to the Next “AI Winter” u These Issues are Present in All Statistical Models – and are Potentially More Problematic in Traditional Statistics - but Have Received Renewed Scrutiny in AI u This Talk Focuses on Fairness, which Necessarily Touches on Accountability and Transparency u What is fairness? u Why can AI be unfair or discriminatory? u What can we do to ensure, or at least mitigate, unfairness in AI? 2
  • 3.
    Fairness in AI uWhy Does Fairness Matter? u Regardless of one’s definition of fairness, everyone wants to be treated fairly u Ensuring fairness is a moral and ethical imperative u Other Reasons to Care about Fairness: u I hate that I have to even say this, but… u You will have a better model u An unfair model gets predictions wrong u A fair model allows expansion into new markets u You may be able to avoid regulatory scrutiny and protect your reputation u The public is skeptical of AI: the burden is on you u Having your CEO testify before a senate subcommittee is not a good career move 3
  • 4.
    Examples of Unfairand Discriminatory Models u Hiring algorithms u Image Recognition u Recidivism Risk (?) u But Automated Systems May Decrease Bias Over Time u Discrimination is irrational to a utility maximizing firm u Automating processes removes subjective decision making 4
  • 5.
    What Does itMean for AI to be Fair? u Imperfect Statistical Models (and Decisions, generally) are Inherently Inequitable, but are Not Necessarily Systematically Unfair or Discriminatory u Conceptually, Fairness is “The quality of treating people equally or in a way that is reasonable.” (Oxford English Dictionary) u This is clearly deeply subjective u There are numerous mathematical definitions of fairness – some of them are contradictory u Anti-classification u Classification parity u Calibration u “21 Definitions of Fairness” These issues are well-outlined in a recent paper by Sam Corbett-Davis and Sharad Goel, “The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning.” https://arxiv.org/pdf/1808.00023.pdf “21 Definitions of Fairness”, 2018 FAT/ML Conference. https://www.youtube.com/watch?v=jIXIuYdnyyk 5
  • 6.
    When Might AIbe Unfair or Discriminatory? u Inaccurate or Insufficient Data u Variables Influence Outcomes Differently by Group (and are under- represented in model development) u Not Fully Causally Related Variables that are Correlated to the Protected Class u Less Discriminatory Models are Available 6
  • 7.
    Discrimination as Unfairness uLegal Protection for Protected Classes u Typically includes race, color, religion, sex, disability, familial status, national origin, and age – depending on the statute (either Federal, State, or local) u Equal Credit Opportunity Act u Title VII of the Civil Rights Act of 1964 u Age Discrimination in Employment Act of 1967 u Employment, credit, and housing are primary targets of regulations u Types of Discrimination u Overt Discrimination u Disparate Treatment u Adverse Impact or Disparate Impact – Demographic Parity 7
  • 8.
    Definitions of Fairness: Anti-Classification uOne May Not Use Protected Class Status when Making a Decision (i.e., Building a Statistical Model) u This corresponds to “disparate treatment” on the prior slide u Use of protected class status does not have to be explicit: proxies for class status are sufficient u Ensuring Anti-Classification is a Legal Requirement in Housing, Employment, Credit u Anti-Classification May Cause Discrimination 8
  • 9.
    Definitions of Fairness: ClassificationParity u Ensuring that Common Measures of Predictive Performance are Equal Across Classes u Equalizing false positive and false negative rates u Differential Validity u Examples of Dangers when Using Classification Parity: u Predatory lending u Increased crime in affected communities 9
  • 10.
    Definitions of Fairness: Calibration uModel Outcomes, Conditioned on Risk, are Independent of Protected Class Status u A Calibrated Model Will Predict that: u Protected class members who have an estimated X% chance of defaulting truly default at a X% rate u At the same time, control class members who have the same estimated X% chance of defaulting truly default at the same X% rate 10
  • 11.
    Contradictions in Fairness: Calibrationand Classification Parity The original article that pointed out this issue is by Jon Kleinberg, et al., “Inherent Trade-Offs in the Fair Determination of Risk Scores,” https://arxiv.org/pdf/1609.05807.pdf Total - Counts Description Prediction 1 Prediction 2 Total N 2,000 2,000 4,000 Model Score 70% 25% 47.5% Calibrated Scores: Number of True Defaults 1,400 500 1,900 Number of Predicted Defaults 1,400 500 1,900 Lender Decision - Reject Individuals with a Score >= 60.0%: Total Acceptances 0 2,000 2,000 Total Rejections 2,000 0 2,000 Total - Confusion Matrix Predicted True Outcome Outcome Defaults Paid-In-Full Total Default 1,400 600 2,000 Paid -In-Full 500 1,500 2,000 Total 1,900 2,100 4,000 Predicted True Outcome Outcome Defaults Paid-In-Full Default 74% 29% Paid -In-Full 26% 71% Accuracy 72.5% Percent of everyone who is correctly identified. Precision 70.0% Percent of estimated defaults who truly default. Recall 73.7% Percent of true defaults who are estimated to default. F1 Score 71.8% Harmonic mean of precision and recall. 11
  • 12.
    Contradictions in Fairness: Calibrationand Classification Parity Non-Hispanic Whites - Counts Description Prediction 1 Prediction 2 Total N 900 1,100 2,000 Model Score 70% 25% 45.3% Calibrated Scores: True Defaults 630 275 905 Predicted Defaults 630 275 905 Lender Decision - Reject Individuals with a Score >= 60.0%: Total Acceptances 0 1,100 1,100 Total Rejections 900 0 900 African-Americans - Counts Description Prediction 1 Prediction 2 Total N 1,100 900 2,000 Model Score 70% 25% 49.8% Calibrated Scores: True Defaults 770 225 995 Predicted Defaults 770 225 995 Lender Decision - Reject Individuals with a Score >= 60.0%: Total Acceptances 0 900 900 Total Rejections 1,100 0 1,100 Non-Hispanic Whites - Confusion Matrix Predicted True Outcome Outcome Defaults Paid-In-Full Total Default 630 270 900 Paid -In-Full 275 825 1,100 Total 905 1,095 2,000 Predicted True Outcome Outcome Defaults Paid-In-Full Default 70% 25% Paid -In-Full 30% 75% African-Americans - Confusion Matrix Predicted True Outcome Outcome Defaults Paid-In-Full Total Default 770 330 1,100 Paid -In-Full 225 675 900 Total 995 1,005 2,000 Predicted True Outcome Outcome Defaults Paid-In-Full Default 77% 33% Paid -In-Full 23% 67% Accuracy 72.8% Precision 70.0% Recall 69.6% F1 Score 69.8% Accuracy 72.3% Precision 70.0% Recall 77.4% F1 Score 73.5% 12
  • 13.
    Disparities in AIModels and the Problem of Proxies u A common concern with AI models is that they may create “proxies” for protected class status, where the complexity of the model leads to class membership being used to make decisions in a way that cannot easily be found and ameliorated. u If the variables used in the model have a strong relationship with class status, spurious correlation or poor model building could lead to a proxy problem. u Measuring within-class disparities - i.e., differences in treatment that only occurs for some members of a class, is much harder 13
  • 14.
    Popular Measures ofDisparity u When Measuring Disparate Impact, We Typically Use: 𝐴𝐼𝑅 = % 𝑃𝑟𝑜𝑡𝑒𝑐𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝑆𝑒𝑙𝑒𝑐𝑡𝑒𝑑 % 𝐶𝑜𝑛𝑡𝑟𝑜𝑙 𝐶𝑙𝑎𝑠𝑠 𝑆𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑆𝑀𝐷 = 100 ∗ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑆𝑐𝑜𝑟𝑒:;<=>?=>@ ABCDD − 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑆𝑐𝑜𝑟𝑒A<F=;<B ABCDD 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛:<HIBC=J<F 14
  • 15.
    Less Discriminatory AIModels u Less Discriminatory Modeling: Focus on Model Fairness Over Social Fairness u The same principles used to find viable less discriminatory models in traditional statistics can be used in AI models. Viable models are: u Similarly predictive u Have lower adverse impact u Are found through a “reasonable search” u Searching for Alternatives in AI may Take Extra Time. However, u The “Multiplicity of Good Models” means that AI models are more likely to yield viable models u Advances in computing that have allowed AI to flourish also allow a more thorough search for alternatives 15
  • 16.
    Less Discriminatory AIModels Methods u Less Discriminatory Models May be Found Through: u Alternative feature selection u Adversarial modeling u Algorithm selection u Hyperparameter searches u Regularization u Data preprocessing u Open Source Fairness Programs u IBM’s AI Fairness 360 u Try These, but Make Sure You Hire a Lawyer (and Maybe a Consultant) 16
  • 17.
    Less Discriminatory AIModels Alternative Feature Selection u Advances in AI and alternative data mean that there may be thousands of possible features used, which makes feature selection very difficult u Smart searches make the problem more tractable. These include the use of: u Variable groupings, which diminishes the number of potential feature combinations to be considered u Statistical methods of ex ante determination of likely drivers of quality and impact u A drop-and-replace method that allows for faster optimization 17
  • 18.
    Less Discriminatory AIModels Alternative Feature Selection 18
  • 19.
    Less Discriminatory AIModels Adversarial NNs Source: https://blog.godatadriven.com/fairness-in-pytorch 19
  • 20.
    Less Discriminatory AIModels Other Methods u Algorithm selection u Hyperparameter searches u Data preprocessing u Regularization 20
  • 21.
    Resources for LearningMore about Fairness in AI u ACM FAT* Conference Proceedings u Patrick Hall’s Excellent Github Repository u https://github.com/jphall663/awesome-machine-learning-interpretability u IBM AI Fairness 360 u http://aif360.mybluemix.net/ u A Few Papers u https://arxiv.org/pdf/1609.05807.pdf u https://arxiv.org/pdf/1811.10154.pdf u http://www.aies-conference.com/wp- content/papers/main/AIES_2018_paper_162.pdf u https://arxiv.org/pdf/1802.04422.pdf 21
  • 22.
    Nicholas Schmidt BLDS, LLC nschmidt@bldsllc.com (312)802-0017 https://www.linkedin.com/in/nickpschmidt/ 22