Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

algorithmic-decisions, fairness, machine learning, provenance, transparency

79 views

Published on

an invited talk given at "Supporting Algorithm Accountability using Provenance", A ProvenanceWeek 2018 workshop
London, July 12th, 2018

Published in: Technology
  • Be the first to comment

  • Be the first to like this

algorithmic-decisions, fairness, machine learning, provenance, transparency

  1. 1. Paolo Missier School of Computing Newcastle University Supporting Algorithm Accountability using Provenance A ProvenanceWeek 2018 workshop London, July 12th, 2018 Transparency and fairness of predictive models, and the provenance of the data used to build them: thoughts and challenges
  2. 2. 2 One of my favourite books <eventname> How much of Big Data is My Data? Is Data the problem? Or the algorithms? Or how much we trust them? Is there a problem at all?
  3. 3. 3 What matters? <eventname> • automatically filtering job applicants • approving loans or other credit • approving access to benefits schemes • predicting insurance risk levels • user profiling for policing purposes and to predict risk of criminal recidivism • identifying health risk factors • … Decisions made based on algorithmically-generated knowledge:
  4. 4. 4 GDPR and algorithmic decision making <eventname> Article 22: Automated individual decision-making, including profiling, paragraph 1 (see figure 1) prohibits any“decision based solely on automated processing, including profiling” which “significantly affects” a data subject. it stands to reason that an algorithm can only be explained if the trained model can be articulated and understood by a human. It is reasonable to suppose that any adequate explanation would provide an account of how input features relate to predictions: - Is the model more or less likely to recommend a loan if the applicant is a minority? - Which features play the largest role in prediction? B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
  5. 5. 5 Interest is qualified and increasing
  6. 6. 7 Transparency and interpretability <eventname> Of algorithms:  ML approaches  Model explanations Of data:  data-based explanations  provenance ?
  7. 7. 8 Interpretability (of machine learning models) <eventname> Z. C. Lipton, “The Mythos of Model Interpretability,” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016. - Transparency - Are features understandable? - Which features are more important? - Post hoc interpretability - Natural language explanations - Visualisations of models - Explanations by example - “this tumor is classified as malignant because to the model it looks a lot like these other tumors” W. Samek, T. Wiegand, and K.-R. Müller, “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models,” Aug. 2017. Interpretability: Ability to provide a qualitative understanding between the input variables and the response
  8. 8. 9 Black-box approaches <eventname> Model agnostic: An explainer should be able to explain any model, and thus be model- agnostic (i.e. treat the original model as a black box) Local fidelity: for an explanation to be meaningful it must at least be locally faithful, i.e. it must correspond to how the model behaves in the vicinity of the instance being predicted
  9. 9. 10 Occlusion testing <eventname> M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
  10. 10. 11 Expected accuracy not enough for trust <eventname> SVM classifier, 94% accuracy …but questionable!
  11. 11. 13 LIME <eventname> Model agnostic Locally faithful: it must correspond to how the model behaves in the vicinity of the instance being predicted M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
  12. 12. 14 Other model explanation approaches <eventname> [1] Lakkaraju, H., Kamar, E., Caruana, R., & Leskovec, J. (2017). Interpretable & Explorable Approximations of Black Box Models. arXiv preprint arXiv:1707.01154. [2] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1721–1730. 1. Black Box Explanations through Transparent Approximations (BETA) [1] • Decision Set approximation of black box models • Fidelity + interpretability of the explanation • Global (unlike LIME) 2. Intelligible additive models [2] • General Additive Model (GAM) • Pairwise interactions General Additive Model (GA2M)
  13. 13. 15 Data  Model  Predictions <eventname> Model Population data pre-processing Raw datasets features Predicted you: - Ranking - Score - Class Data collection Instances Key decisions are made during data collection: - Where does the data come from? - What’s in the dataset? Complementing current ML approaches to model interpretability
  14. 14. 16 Possible roles for provenance <eventname> 1) Data acquisition: Provenance  Transparency  Trust
  15. 15. 17 Data  Model  Predictions <eventname> Model Population data pre-processing Raw datasets features Predicted you: - Ranking - Score - Class Data collection Instances Key decisions are made during - Data collection: - where does the data come from? What’s in the dataset? - Data preparation: how was it pre-processed? 1. Can we explain these decisions? 2. Are these explanations useful?
  16. 16. 18 Explaining data preparation PaoloMissier(Computing),DennisPrangle(Stats) Data collection Model Population data pre-processing Raw datasets features Predicted you: - Ranking - Score - Class - Integration - Cleaning - Outlier removal - Normalisation - Feature selection - Class rebalancing - Sampling - Stratification - … Data acquisition and wrangling: - How were datasets acquired? - How recently? - For what purpose? - Are they being reused / repurposed? - What is their quality? Instances - Scripts  Python / TensorFlow, Pandas, Spark - Workflows  Knime, … Provenance  Transparency
  17. 17. 19 Provenance for transparency <eventname> 1. Collection - Program-level - System-level 2. Representation - W3C PROV (for interoperability) - Multiple proprietary formats (for efficient encoding) 3. Querying / analysis • RDBMS • GDBMS • RDF / SPARQL • Configuration of each pre-processing step • Data dependency graph - Which kind of normalisation did you apply? - Was the data (down/up) sampled? How? - How did you define / remove outliers? - How did you window your time series? - Was the data repurposed (acquired from a repository)? - How was the original protocol defined?
  18. 18. 20 Example <eventname> • The classic ”Titanic” dataset • Can you predict survival probabilities? • A simple logistic regression analysis Survived - Survival (0 = No; 1 = Yes) Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd) Name - Name Sex - Sex Age - Age SibSp - Number of Siblings/Spouses Aboard Parch - Number of Parents/Children Aboard Ticket - Ticket Number Fare - Passenger Fare (British pound) Cabin - Cabin Embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
  19. 19. 21 Enable analysis of data pre-processing <eventname> Managing missing values Is the target class balanced? • Data preparation workflow includes a number of decisions Dropping irrelevant attributes PassengerId', 'Name', 'Ticket', 'Cabin' Dropping correlated features (?) Age missing in 714/891 records “Pclass is a good predictor for age” Impute Age values using average age for PClass Drop “Fare”, “Pclass”
  20. 20. 22 Example: missing values imputation <eventname>
  21. 21. 23 Exploring the effect of alternative pre-processing <eventname> D P1 D1 Learn M1 Predict x y1 How can knowledge of P1, P2 help understand why y1 ≠ y2 ? Ex. Alternative imputation methods for missing values Ex. Boost minority class / downsample majority class P2 D2 Learn M2 Predict y2 y1 ≠ y2
  22. 22. 24 Also: script alludes to human decisions <eventname> How do we capture these decisions? To what extent can they be inferred from code?
  23. 23. 25 Correlation analysis <eventname> • Is Pclass really a good predictor for Age? • Why drop both PClass and Fare? 1. Dropped Age only (Nearly identical performance (F1=0.77, 0.76)) 2. Use sex, Pclass only Alternative pre-processing:
  24. 24. 26 Possible roles for provenance <eventname> 1) Data acquisition: Provenance  Transparency  Trust 2) Data transformation: Provenance  explanations - Is data preparation correct? - Is training data fit to learn from? - What is the effect of alternative pre-processing? - Can we infer data prep decisions from pre-processing code?
  25. 25. 27 Bias (in ML) <eventname> (*) Mitchell, T. M. (1980). The need for biases in learning generalizations. Tech. rep. CBMTR-117, Rutgers University, New Brunswick, NJ Bias: “Any basis for choosing one generalization [hypothesis] over another, other than strict consistency with the observed training instances." (*) Absolute bias: • certain hypotheses are entirely eliminated from the hypothesis space) • Eg “A priori choice of model (decision trees, SVM, NN, …) Relative bias: • certain hypotheses are preferred over others • Eg “prefer shallow simple decision trees to deep ones”
  26. 26. 28 Fairness and bias: the (notorious) COMPAS case <eventname> • Increasingly popular within the criminal justice system • Used or considered for use in pre-trial decision-making (USA) 1: The initial claim Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. 2016. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism- algorithm black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent). white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent)
  27. 27. 29 Model Fairness and data bias <eventname> A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments,” Big Data, vol. 5, no. 2, pp. 153–163, Jun. 2017. In this paper we show that the differences in false positive and false negative rates cited as evidence of racial bias in the ProPublica article are a direct consequence of applying an instrument that is free from predictive bias to a population in which recidivism prevalence differs across groups. COMPAS complies with the test fairness condition: Observed P(Y | S=s) largely independent of R
  28. 28. 30 COMPAS Scores are skewed <eventname> - scores for white defendants were skewed toward lower-risk categories, while black defendants were evenly distributed across scores - large discrepancies in FPR and FNR between Black and White defendants - … but this does not mean that the score itself is unfair 6,172 defendants who had not been arrested for a new offense or who had recidivated within two years
  29. 29. 31 FPR / FNR <eventname> positive predictive value of Sc: The test fairness condition (2.1) can be expressed as the constraint that PPV does not depend on R recidivism prevalence within groups: False positive rate: False negative rate: When the recidivism prevalence differs between two groups, a test-fair score cannot have equal FPR, FNR across those groups
  30. 30. 32 The actual “provenance” of the analysis <eventname> https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm Data acquisition + transformation  Model bias and fairness - Can knowledge of data prep explain model bias? - Does data prep introduce / remove bias?
  31. 31. 33 Fairness: many possible definitions <eventname> (*) M. J. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual Fairness,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V Luxburg, S. Bengio, H. Wallach, R. Fergus, S.Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4066–4076.
  32. 32. 34 Causality and counterfactual fairness <eventname> aggressive driving accident rate red cars preference Driver’s race Latent Protected Predicted Observable • Individuals belonging to a race A are more likely to drive red cars (A  X) • However, race is not a good predictor for either U or Y • Aggressive drivers tend to prefer red cars (U  X) Using X to predict Y leads to a counterfactually unfair model: • it may charge individuals of a certain race more than others, even though no race is more likely to have an accident Is knowledge of data prep useful at all to determine this kind of fairness?
  33. 33. 35 Possible roles for provenance <eventname> 1) Data acquisition: Provenance  Transparency  Trust 2) Data transformation: Provenance  explanations - Is data preparation correct? - Is training data fit to learn from? - What is the effect of alternative pre-processing? 3) Data acquisition + transformation  Model bias and fairness - Is provenance useful to diagnose an unfair / biased model? - Does data prep introduce / remove bias?
  34. 34. 36 Opportunities and challenges: Summary <eventname> 1) Data acquisition: Provenance  Transparency  Trust 2) Data transformation: Provenance  explanations - Is data preparation correct? - Is training data fit to learn from? - What is the effect of alternative pre-processing? 3) Data acquisition + transformation  Model bias and fairness - Is provenance useful to diagnose an unfair / biased model? - Does data prep introduce / remove bias?
  35. 35. 37 A few initial references [1] C. O’Neill, Weapons of Math Destruction. Crown books, 2016. [2] B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016. [3] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144. [4] H. Lakkaraju, S. H. Bach, and J. Leskovec, “Interpretable Decision Sets: A Joint Framework for Description and Prediction,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1675–1684. [5] K. Yang and J. Stoyanovich, “Measuring Fairness in Ranked Outputs,” in Proceedings of the 29th International Conference on Scientific and Statistical Database Management - SSDBM ’17, 2017, pp. 1–6. [6] T. Gebru et al., “Datasheets for Datasets,” 2108. [7] Z. Abedjan, L. Golab, and F. Naumann, “Profiling relational data: a survey,” VLDB J., vol. 24, no. 557, 2015. [8] A. Weller, “Challenges for Transparency,” in Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016). [8] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1721–1730.
  36. 36. 38 Thank you <eventname> Paolo.Missier@newcastle.ac.uk School of Computing, Newcastle University http://tinyurl.com/paolomissier LinkedIn: www.linkedin.com/in/paolomissier Twitter: @Pmissier

×