Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AI Bias Oxford 2017

72 views

Published on

Invited talk on Bias in AI and its implications in decision making. What is bias? Can we remove it? Including a look at some specific examples e.g. Oxford entry and COMPAS

Published in: Technology
  • Be the first to comment

  • Be the first to like this

AI Bias Oxford 2017

  1. 1. ISSUES IN AI, POLICING AND CRIMINAL JUSTICE: BIAS Dr Janet Bastiman @yssybyl janet.bastiman@story-stream.com STORY-STREAM.COM.
  2. 2. What is AI? "Any system that makes a decision that appears to be intelligent from specific inputs” John McCarthy (1955) AI -> Machine Learning -> Deep Learning -> (G)AI
  3. 3. Visualizing and Understanding Convolutional Networks, Zeiler and Fergus 2013 https://arxiv.org/pdf/1311.2901.pdf Deep if there's more than one stage of non-linear feature transformation
  4. 4. What is Bias? "an unwarranted correlation between input variables and output classification" "erroneous assumptions in the learning algorithm resulting in missing relevant relationships" - bias (underfitting) "noise is modelled rather than the valid outputs" - variance (overfitting)
  5. 5. Under vs Over
  6. 6. Poor data is also a problem
  7. 7. Algorithm too simple Prediction: Cow Prediction: Horse
  8. 8. Industry problems Nothing is (externally) peer reviewed IP is in the training data, network architecture and test set - hidden Results are cherry picked / exaggerated
  9. 9. This might look fine but would be embarrassing and potentially contract ending for a brand – how disastrous could this be for a decision on a person’s life?
  10. 10. Machine Washing • Overcomplicating AI deliberately to promote its abilities • Removes desire to question as it sounds too difficult • Pretending something is AI when it’s not We cannot have transparency when the general public are deliberately misled with statistics in the interest of sensationalism.
  11. 11. Bias in data gathering • Ignorance of the problem – building a lego kit without picture or instructions • Asking biased questions to lead the model – presupposing the answer • Limiting the data to a set that supports the hypothesis
  12. 12. Data choices Must be: • representative of real world data • varied • sufficient • able to define the predictions well There will always be exceptions - a good model will handle these sensibly. Ignorance of the problem space will lead to poor models
  13. 13. Data choices ... Biased data will result in biased models E.g. Oxbridge entry is inherently biased: • state school candidates are: • less likely to apply • more likely to apply to oversubscribed courses Hence there is an observed bias towards privately educated students getting places. Racial discrepancy is exacerbated without understanding the inherent data bias
  14. 14. COMPAS: Is there bias? If so, where? • Only looked at individuals arrested for crimes • Exact algorithms and training data unknown • 137 point questionnaire to feed risk score • Questions appear unsuitable • Testing showed a significant correlation between risk scores and reoffending • Black individuals were given higher risk scores than white individuals for the same crimes Was it biased?
  15. 15. COMPAS: Self-Validation • 5575 individuals • 50.2% Black, 42.0% White, 7.8% Other • 86.3% male • Had been assessed by COMPAS for risk of reoffence and were then monitored over 12 months. • Over half the individuals scored low (1-3) • Changes between percentage, percentage change and percentage per category to show results in the best light http://criminology.fsu.edu/wp-content/uploads/Validation-of-the-COMPAS-Risk-Assessment-Classification-Instrument.pdf
  16. 16. COMPAS: Self-Validation
  17. 17. COMPAS: Pro Republica Study • Same individuals • Concluded that there was fundamental bias as a false positive for a black defendant categorised as high risk was twice that of a white defendant Two conflicting statistical studies… https://www.prorepublica.org/article/how-we-analysed-the-compass-recidivism-algorithm
  18. 18. Bias-free but skewed population? • Chouldechova (2016) reviewed both • For all risk scores in the set, the test is fair if the probability is irrespective of group membership • P(Y=1 | S= s, R = b) = P(Y=1 | S=s, R = w) • COMPAS adheres well to this fairness condition • However, FNR and FPR are related by: • 𝐹𝑃𝑅 = 𝑝 1−𝜌 1−𝑃𝑃𝑉 𝑃𝑃𝑉 1 − 𝐹𝑁𝑅 • If recidivism is not equal between the two groups then a fair test score cannot have equal FPR and FNR across the two groups • Either the prediction is unbiased or the error is unbiased https://arxiv.org/pdf/1610.07524.pdf
  19. 19. Inherent trade offs • Kleinberg et al (2016) • Is statistical parity possible? Or should we strive for balance of classes so that the chance of making a mistake does not depend on their group. • Also determined independently that you could not balance unbiased predictions with unbiased errors. • They define more complex feature vectors 𝜎 and avoid the case where 𝜎 is not defined or incomplete for some individuals • Rigorous proof that you cannot balance all side https://arxiv.org/pdf/1609.05807.pdf
  20. 20. Making data fair? • Hardt et al (2016) • Ignore all protected attributes? Ineffective due to redundant encodings and other connected attributes • Demographic parity? If the probability within a class varies you cannot have positive parity and error parity at the same time • Propose new parity – aiming to equalise both true positives and false positives • “fairer” is still subjective https://ttic.uchicago.edu/~nati/Publications/HardtPriceSrebro2016.pdf
  21. 21. Summary • Cannot maximise predictive success and equalise error across unbalanced classes • Several approaches – depends on what is the goal of the algorithm • Input data needs to be unbiased “All models are wrong, but some are useful – how wrong do they have to be to not be useful?” – George E. P. Box Human “gut feel” decisions are acceptable and when challenged as long as some of the contributing factors are explained, that’s okay. Are we holding AI predictive models to an unachievably high standard that we do not apply to humans?

×