Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lessons learned

422 views

Published on

Lesson Learned: Machine Learning and Technical Debt

Published in: Software
  • Be the first to comment

  • Be the first to like this

Lessons learned

  1. 1. Lessons Learned: Machine Learning and Technical Debt Matthew Kirk @mjkirk
  2. 2. Who uses data?
  3. 3. Responsive Enterprise
  4. 4. A Golden Opportunity
  5. 5. The Danger
  6. 6. The High Interest Debt of Machine Learning
  7. 7. What we’re covering • Boundary Erosion • Data Dependencies • Spaghetti Code • The Real World
  8. 8. `whoami` • O’Reilly Author - Thoughtful Machine Learning. Use AUTHD to get a discount on OReilly.com. • Former Financial Quant • Independent Consultant • @mjkirk
  9. 9. Boundary Erosion • Entanglement • Visibility Debt
  10. 10. Entanglement
  11. 11. Entanglement: Solution • Isolate models as much as possible • Regularization
  12. 12. Visibility Debt
  13. 13. Solutions • Keeping an API Log • Monitoring of tool use • No sharing of usernames :)
  14. 14. Data Dependencies • Unstable • Underutilized
  15. 15. Unstable Data
  16. 16. Solution • Versioning • Keep a specific version of a dataset. For instance a timestamped version of language data.
  17. 17. Underutilized
  18. 18. Solution • Feature engineering: PCA, ICA, Random Feature Selection, VIMP, etc.
  19. 19. Spaghetti Code • Glue Code • Pipeline Jungle • Experimental Paths • Configuration Debt
  20. 20. Glue Code R, Matlab, Python, Java. All to use that one implementation
  21. 21. Solution • Write your own implementation of the algorithm….
  22. 22. Pipeline Jungle
  23. 23. Conway’s Law
  24. 24. The Clymb’s Database V1.0 PS: No Monitoring on any of this.
  25. 25. Clymb DB V2.0
  26. 26. Solution • Map systems and reduce • Reduce organizational disconnects by attending stand ups and being a part of the engineering team
  27. 27. Experimental Paths
  28. 28. Solution: Tombstones ! • def run_this_once_in_prod!; Tombstone.new(‘2014-01-02’); end • When you think something is dead put a Tombstone on it • https://www.youtube.com/watch?v=29UXzfQWOhQ
  29. 29. Configuration Debt
  30. 30. Solution • Find optimal configurations regularly • Revisit initial configuration with new datapoints.
  31. 31. External World Changes • Fixed Thresholds • Correlation changes
  32. 32. Fixed Thresholds • Law’s Change: The drinking age used to be 19 in many states.
  33. 33. Solution • Rebuild, or include accuracy as part of your model to minimize on. • Min Cost = Actual - Predicted
  34. 34. Correlations Change
  35. 35. Solution • Be careful when trying to find causal evidence. Think what if the model doesn’t work. • Iterate often
  36. 36. Questions?
  37. 37. The Blissful Land of Opportunity
  38. 38. Lessons Learned In one Slide Danger Solutions Entanglement Regularize or Isolate Models Visibility Debt Keep an access log of who uses what Unstable Data Version datasets Underutilized Data Trim by finding better features Glue Code Write your own implementations Pipeline Jungle Find minimum cut in systems Experimental Paths Use Tombstones Configuration Debt Reconfigure with new datasets Fixed Thresholds Include accuracy as part of model Correlation Changes Trim non-causal data from models
  39. 39. Links and Contact • @mjkirk • matt@matthewkirk.com • Machine Learning: The High-Interest Credit Card of Technical Debt: https://bit.ly/1zs9TXi • Is that code dead?: http://bit.ly/1sg0B1L
  40. 40. Photo Sources • Cost of gigabyte: http://royal.pingdom.com/2011/12/19/would-you-pay-7260-for-a-3-tb-drive-charting-hdd-and-ssd-prices-over-time/ • Golden Opportunity: https://flic.kr/p/7xvfZr • Problems are Opportunities: https://flic.kr/p/ifFos • Master Charge: https://flic.kr/p/noQUh1 • Erosion: https://flic.kr/p/9agH2q • Coupler: https://flic.kr/p/ppm9HG • Fruit Loops: https://flic.kr/p/5rkLhP • Somewhere in Quản Bạ, Hà Giang: https://flic.kr/p/q4K9Bo • Data Dependencies: https://flic.kr/p/dVq7vg • Unstable!: https://flic.kr/p/s7RLj • Underutilized Piano: https://flic.kr/p/2sZVP • Spaghetti: https://flic.kr/p/tuwkp • Glue: https://flic.kr/p/6L13SK • Pipelines at google: https://flic.kr/p/pvLQG2

×