Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Measuring Model Fairness - Stephen Hoover

Download to read offline

Machine learning models are increasingly used to make decisions that affect people’s lives. With this power comes a responsibility to ensure that model predictions are fair. In this talk I’ll introduce several common model fairness metrics, discuss their tradeoffs, and finally demonstrate their use with a case study analyzing anonymized data from one of Civis Analytics’s client engagements.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Measuring Model Fairness - Stephen Hoover

  1. 1. Measuring Model Fairness Stephen Hoover PyData LA 2018 October 23
  2. 2. Measuring Model Fairness Stephen Hoover PyData LA 2018 October 23 J. Henry Hinnefeld, Peter Cooman, Nat Mammo, and Rupert Deese, "Evaluating Fairness Metrics in the Presence of Dataset Bias"; https://arxiv.org/abs/1809.09245
  3. 3. What is fair? https://pxhere.com/en/photo/199386
  4. 4. Models determine whether you can buy a home... https://www.flickr.com/photos/cafecredit/26700612773
  5. 5. and how long you spend on parole... https://www.flickr.com/photos/archivesnz/27160240521
  6. 6. and what advertisements you see. https://www.flickr.com/photos/dno1967b/8283313605
  7. 7. and what advertisements you see. https://www.flickr.com/photos/44313045@N08/6290270129
  8. 8. How do you measure if your model is fair? https://pixabay.com/en/legal-scales-of-justice-judge-450202/
  9. 9. How do you measure if your model is fair? https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
  10. 10. How do you measure if your model is fair? https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
  11. 11. How do you measure if your model is fair? http://www.northpointeinc.com/files/publications/Criminal-Justice-Behavior-COMPAS.pdf https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
  12. 12. The bias comes from your data. https://commons.wikimedia.org/wiki/File:Bias_t.png
  13. 13. Subtlety #1: Different groups can have different ground truth positive rates https://www.breastcancer.org/symptoms/understand_bc/statistics
  14. 14. Certain fairness metrics make assumptions about the balance of ground truth positive rates Disparate Impact is a popular metric which assumes that the ground truth positive rates for both groups are the same Certifying and removing disparate impact, Feldman et al. (https://arxiv.org/abs/1412.3756)
  15. 15. Subtlety #2: Your data are a biased representation of ground truth Datasets can contain label bias when a protected attribute affects the way individuals are assigned labels. In addition, the results indicate that students from African American and Latino families are more likely than their White peers to receive expulsion or out of school suspension as consequences for the same or similar problem behavior. A dataset for predicting “student problem behavior” that used “has been suspended” for its label could contain label bias. ”Race is not neutral: A national investigation of African American and Latino disproportionality in school discipline.” Skiba et al.
  16. 16. Subtlety #2: Your data are a biased representation of ground truth Datasets can contain sample bias when a protected attribute affects the sampling process that generated your data We find that persons of African and Hispanic descent were stopped more frequently than whites, even after controlling for precinct variability and race-specific estimates of crime participation. A dataset for predicting contraband possession that used stop-and-frisk data could contain sample bias. ”An analysis of the NYPD’s stop-and-frisk policy in the context of claims of racial bias” Gelman et al.
  17. 17. Certain fairness metrics are based on accuracy with respect to possibly biased labels Equal Opportunity is a popular metric which compares the True Positive rates between protected groups Equality of Opportunity in Supervised Learning, Hardt et al. (https://arxiv.org/pdf/1610.02413.pdf)
  18. 18. Subtlety #3: It matters whether the modeled decision’s consequences are positive or negative When a model is punitive you might care more about False Positives. When a model is assistive you might care more about False Negatives. The point is you have to think about these questions.
  19. 19. We can’t math our way out of thinking about fairness You still need a person to think about the ethical implications of your model Originally people thought ‘Models are just math, so they must be fair’ → definitely not true Now there’s a temptation to say ‘Adding this constraint will make my model fair’ → still not automatically true
  20. 20. Can we detect real bias in real data? Spoiler: it can be tough! ● Start with real data from Civis's work ○ Features are demographics, outcome is a probability ○ Consider racial bias; white versus African American
  21. 21. Can we detect real bias in real data? Create artificial datasets with known causal bias; then we'll see if we can detect it. ● Start with real data from Civis's work ○ Features are demographics, outcome is a probability ○ Consider racial bias; white versus African American ● Two datasets: ○ Artificially balanced: select white subset and randomly re-assign race ○ Unmodified (imbalanced) dataset
  22. 22. Next introduce known sample and label bias Label bias: you're in the dataset, but protected class affects your label Use the original dataset but modify the labels For unbiased data, use
  23. 23. Next introduce known sample and label bias Sample bias: protected class affects whether you're in the sample at all Create a modified dataset with labels taken from the original data For unbiased data, use
  24. 24. Two experiments, four datasets each Train an elastic net logistic regression classification model on each dataset
  25. 25. Train, predict, then measure Apply each fairness metric to the model's probability predictions Disparate Impact Equal Opportunity Equal Mis-Opportunity Difference in Average Residuals
  26. 26. With balanced ground truth, all metrics detect bias Good news! No bias Sam ple bias Label bias Both No bias Sample bias Label bias Both No bias Sample bias Label bias Both
  27. 27. No bias Sam ple bias Label bias Both No bias Sample bias Label bias Both No bias Sample bias Label bias Both With imbalanced ground truth, all metrics still detect bias... ...even when there isn't any bias in the "truth".
  28. 28. No bias Sam ple bias Label bias Both No bias Sample bias Label bias Both No bias Sample bias Label bias Both Label bias is particularly hard to distinguish from imbalanced ground truth Which is fair, since reality has label bias in this case.
  29. 29. There's no one-size fits all solution Except for "think hard about your inputs and your outputs" ● These metrics can help
  30. 30. There's no one-size fits all solution Except for "think hard about your inputs and your outputs" ● These metrics can help ● There are methods you can use in training to make your model more fair ○ (assuming you know what "fair" means for you) https://arxiv.org/pdf/1412.3756.pdf https://arxiv.org/pdf/1806.08010.pdf http://proceedings.mlr.press/v28/zemel13.pdf
  31. 31. There's no one-size fits all solution Except for "think hard about your inputs and your outputs" ● These metrics can help ● There are methods you can use in training to make your model more fair ● Use a diverse team to create the models! https://imgur.com/gallery/hem9m
  32. 32. There's no one-size fits all solution Except for "think hard about your inputs and your outputs" ● These metrics can help ● There are methods you can use in training to make your model more fair ● Use a diverse team to create the models! ● Know your data and check your predictions https://pixabay.com/en/isolated-thinking-freedom-ape-1052504/
  33. 33. “Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit.” ― Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
  34. 34. “Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit.” ― Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy J. Henry Hinnefeld, Peter Cooman, Nat Mammo, and Rupert Deese, "Evaluating Fairness Metrics in the Presence of Dataset Bias"; https://arxiv.org/abs/1809.09245 Stephen Hoover @StephenActual

Machine learning models are increasingly used to make decisions that affect people’s lives. With this power comes a responsibility to ensure that model predictions are fair. In this talk I’ll introduce several common model fairness metrics, discuss their tradeoffs, and finally demonstrate their use with a case study analyzing anonymized data from one of Civis Analytics’s client engagements.

Views

Total views

333

On Slideshare

0

From embeds

0

Number of embeds

8

Actions

Downloads

9

Shares

0

Comments

0

Likes

0

×