Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyCon Colombia Keynote 2018

329 views

Published on

Machine learning is responsible for major breakthroughs and touches many aspects of our lives, but if we do not think carefully about our data and the errors our algorithms make, we can build machine learning products that have unintended consequences. This talk outlines this problem and suggests a few concrete steps toward building safer machine learning products.

Published in: Education
  • Be the first to comment

  • Be the first to like this

PyCon Colombia Keynote 2018

  1. 1. WITH GREAT MACHINE LEARNING COMES GREAT RESPONSIBILITY Deborah Hanus @deborahhanus
  2. 2. MACHINE LEARNING IS THE FUTURE Image: chatbots life
  3. 3. WHY IS ARTIFICIAL INTELLIGENCE SOMETIMES SO … UNINTELLIGENT?
  4. 4. BACKGROUND
  5. 5. MACHINE LEARNING IS THE FUTURE Image: chatbots life
  6. 6. WHAT IS MACHINE LEARNING? PROBABILITY & STATISTICS LINEAR ALGEBRA COMPUTER SCIENCE Image: UWImage:WikipediaImage: Charpentier
  7. 7. WHAT IS MACHINE LEARNING? PROBABILITY & STATISTICS LINEAR ALGEBRA COMPUTER SCIENCE
  8. 8. +MACHINE LEARNING
  9. 9. MACHINE LEARNING IS ABOUT DATA Image: xkcd
  10. 10. Credit: Gil Press, Forbes 82% 13% 5%
  11. 11. MACHINE LEARNING IN REAL LIFE “WHY ISN’T MY ALGORITHM CONVERGING?”
  12. 12. MACHINE LEARNING IN REAL LIFE “IT CONVERGED! I’M DONE”
  13. 13. Coded Gaze: Joy Buolamwini Image: NYT
  14. 14. PREDICTING RECIDIVISM SIMILAR PETTY THEFT SIMILAR DRUG CHARGE ProPublica, Machine Bias
  15. 15. YOUR RESULTS ARE ONLY AS GOOD AS YOUR DATA
  16. 16. HOW CAN WE IMPROVE MACHINE LEARNING? • Explore & understand your data • Explore your errors • Make your results interpretable
  17. 17. HOW CAN WE IMPROVE MACHINE LEARNING? • Explore & understand your data • Explore your errors • Make your results interpretable
  18. 18. WHERE DO WE GET DATA? • Curated datasets (e.g. Kaggle) • Curated APIs (e.g.Twitter) • Careful collaborators • Web Scraping aka “The Internet”
  19. 19. LABELED FACES INTHE WILD http://vis-www.cs.umass.edu/lfw/
  20. 20. LABELED FACES INTHE WILD http://vis-www.cs.umass.edu/lfw/
  21. 21. LABELED FACES INTHE WILD Image: TheTrouble with Bias, NIPS 2017
  22. 22. WORD EMBEDDINGS Word embedding - a method of representing words as vectors word2vec - a group of models to produce word embeddings Source: https://en.wikipedia.org/wiki/Word2vec
  23. 23. WORD EMBEDDINGS
  24. 24. DATA OFTEN EXPRESSES HUMAN BIAS
  25. 25. EXPLOREYOUR DATA
  26. 26. https://pair-code.github.io/facets/
  27. 27. VISUALIZATIONTOOLS
  28. 28. UNDERSTANDYOUR DATA
  29. 29. UNDERSTANDYOUR DATA Patient vitals (i.e. heart rate, blood oxygenation, blood pressure) White hospital sheet has 62% oxygenation
  30. 30. UNDERSTANDYOUR DATA Data: Electronic Health Records. Goal: Find how long a patient has had diabetes. • Diagnosis code • Compare to control • Define a rule
  31. 31. WE HAVE A PROBLEM …
  32. 32. DATA COLLECTION DATA ANALYSIS
  33. 33. COMMUNICATION
  34. 34. UNDERSTANDYOUR DATA How can a researcher get domain knowledge? • Become a medical professional • Find a great collaborator
  35. 35. UNDERSTANDYOUR DATA How can a data scientist understand their data? • Collect their own data • Collaborate closely with engineers
  36. 36. HOW CAN WE IMPROVE MACHINE LEARNING? • Explore & understand your data • Explore your errors • Make your results interpretable
  37. 37. EXPLOREYOUR ERRORS • What did your model misclassify? • Are any of those errors systematic?
  38. 38. ANALYZE ERRORS Confusion Matrix Good cases first False positive Fasle negative Find a simpler confusion matrix HIV test? FP - antibiotics FN - you might have complications Cancer? No matter how you define your test, you need to choose between FP and FN
  39. 39. Labeling Sensitive Content Good example YouTube started labeling sensitive content — normally, you might think that you want to bias towards FP that should have been labeled is worse than labeling when it isn’t
  40. 40. ANALYZE ERRORS
  41. 41. HOW CAN WE IMPROVE MACHINE LEARNING? • Explore & understand your data • Explore your errors • Make your results interpretable
  42. 42. WHAT IS INTERPRETABILITY?
  43. 43. INTERPRETABLE MODELS https://www.eugdpr.org/eugdpr.org.html
  44. 44. WHAT IS INTERPRETABILITY? • Interpretability of machine learning models • Interpretability of results
  45. 45. WHAT IS INTERPRETABILITY? • Interpretability of machine learning models • Interpretability of results
  46. 46. INTERPRETABLE MODELS
  47. 47. INTERPRETABLE MODELS LIME: Local Interpretable Model-agnostic Explanations Intuition: https://www.kdnuggets.com/2016/08/introduction-local-interpretable-model-agnostic-explanations-lime.html
  48. 48. WHAT IS INTERPRETABILITY? • Interpretability of machine learning models • Interpretability of results
  49. 49. INTERPRETABLE RESULTS Build things that people understand enough to trust.
  50. 50. INTERPRETABLE MODELS Credit: Doshi-Velez & Kim, 2017
  51. 51. CONCRETE STEPSTO RESPONSIBLE MACHINE LEARNING • Understand your data. • Make time to analyze your errors. • Make results interpretable. • If you do not know something, find someone who does. @deborahhanus

×