Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data for Machine Learning: How things have changed over the last decade

16 views

Published on

Talk delivered to Rocky Mountain Data Con 2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Big Data for Machine Learning: How things have changed over the last decade

  1. 1. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 1/37 Big Data for Machine Learning: How things have changed over the last decade Diana Pfeil, @dianam
  2. 2. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 2/37
  3. 3. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 3/37 Big Data 10 years ago
  4. 4. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 4/37
  5. 5. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 5/37
  6. 6. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 6/37 via GIPHY
  7. 7. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 7/37 The invariant principles of big data engineering
  8. 8. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 8/37 DRY/Open it up make it generic communicate it with excellent documentation open it up to all
  9. 9. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 9/37 Back to the deployment process "machine learning": [Item1, Item2, Item3, ...] service.getValue() P13nFileBasedMappingService
  10. 10. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 10/37 via GIPHY
  11. 11. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 11/37
  12. 12. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 12/37
  13. 13. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 13/37 Source: http://iquantny.tumblr.com/post/83696310037/meet-the- re-hydrant-that-unfairly-nets-nyc
  14. 14. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 14/37
  15. 15. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 15/37 Keep It Simple Try the most simple approach that will work Modularize: do one thing well
  16. 16. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 16/37
  17. 17. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 17/37 Avoid relational databases administrative hassle complexity performance
  18. 18. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 18/37 Just add a server
  19. 19. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 19/37
  20. 20. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 20/37 Measure Everything
  21. 21. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 25/37 What has Changed choice so much literature so much open source open data accessibility of big data to all data ethics
  22. 22. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 26/37
  23. 23. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 27/37
  24. 24. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 28/37 Models for sentencing
  25. 25. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 29/37 Failure modes for Northpointe recidivism model white african- america Labeled high risk, did not re- offend 23% 45% Labeled low risk, did re- offend 48% 28%
  26. 26. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 30/37 Source: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Dangerous Model Territory
  27. 27. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 31/37 scale:signi cant impact on everyone's lives
  28. 28. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 32/37 unfair:illegal or unjust factors used in decision-making
  29. 29. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 33/37 opaque:model is not open or reviewable by those affected
  30. 30. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 34/37 no feedback loop:model does not course-correct
  31. 31. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 35/37 via GIPHY
  32. 32. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 36/37 The future Technologies will continue to evolve at incredible speeds Devops will get easier and cheaper because serverless! Machine learning will still require thinking and domain expertise
  33. 33. 2/4/2017 reveal.js file:///Users/diana/Documents/Presentations/Rocky%20Mountain%20DataCon%202016/reveal.js-3.3.0/index.html?print-pdf#/ 37/37 Questions? @dianam

×