Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Productionalizing Machine Learning Models: The Good, the Bad, and the Ugly

303 views

Published on

Data science teams tend to put a lot of thought into developing a predictive model to address a business need, tackling data processing, model development, training and validation. After validation, the model then tends to get rolled out -- without much testing -- into production. While software engineering best practices have been around for a long time, until recently, no formal guidelines existed for checking the quality of code of a machine learning pipeline.

The talk will cover tips and best practices for writing more robust production-ready predictive model pipelines. We know that code is never perfect; Irina will also share the pains and lessons learned from experience productionalizing and maintaining 4 customer-facing models at 4 different companies: in online advertising, consulting, finance, and fashion.

Published in: Data & Analytics
  • Be the first to comment

Productionalizing Machine Learning Models: The Good, the Bad, and the Ugly

  1. 1. Productionalizing Machine Learning Models: Good, Bad and Ugly Irina Kukuyeva, Ph.D. SoCal Python January 30, 2018
  2. 2. 2 Fashion: Consulting: IoT Healthcare Media & Entertainment Finance CPG Retail Video Game Publisher Online Advertising: Healthcare: Ph.D. My Background
  3. 3. 3 Spectrum of (Select) Production Environments Online Advertising* Revenue $1M+/year Near-real time Consulting Revenue $3M+/year ASAP Fashion Customer Retention Near-real time Finance Automation Daily Love, love, love leather/faux leather jackets! Especially the blue! Positive sentiment
  4. 4. What ML Model Lifecycle Really Looks Like what customer described what data looked like what DS began to build what budget could buy what pivot looked like what code got reused what got tested [1] what got pushed to prod what got documented what customer wanted 4
  5. 5. Agenda what customer described what data looked like what DS began to build what budget could buy what pivot looked like what code got reused what got tested [1] what got pushed to prod what got documented what customer wanted Step 1 (Appendix) Step 2: Data QA (Other Talks) Step 3: ML Development (Other Talks) Step 4: Pre-prod Step 5: Prod Step 5: Prod Step 4: Pre-prod Step 1 (Appendix) 5
  6. 6. Step 4: Pre-Production 6
  7. 7. 7 Pre-Production: Pay Down Tech Debt Technical Debt — Borrowing against future to tradeoff between quality of code and speed of delivery now [2], [3] • Incur debt: write code, including ML pipelines [4] • Pay down debt: extensively test and refactor pipeline end-to-end [5]
  8. 8. → Test 1: Joel’s Test: 12 Steps to Better Code [6] • Spec? • Source control? Best practices? [7] • One-step build? Daily builds? CI? • Bug database? Release schedule? QA? • Fix bugs before write new code? 8 Pre-Production: Pay Down Tech Debt [8]
  9. 9. 9 Consulting Daily stand-up Sprint planning Version control* Fix bugs first Bugs emailed/db Release on bugfix One-step build Atlassian suite Virtual machines Online Advertising* Daily stand-up Sprint planning — — Bug database — — — — Finance — — Version control* Fix bugs first — — One-step build Trello — Fashion Weekly stand-up — Version control Fix bugs first — Release on bugfix One-step build PRD, Trello Virtual env, CircleCI, Docker Test 1: Joel’s Test: 12 Steps to Better Code … in practice: Pre-Production: Pay Down Tech Debt
  10. 10. → Test 2: ML Test Score [9], [10] • Data and feature quality • Model development • ML infrastructure • Monitoring ML 10 Pre-Production: Pay Down Tech Debt [11]
  11. 11. 11 → Other tips — ML: • Choose simplest model, most appropriate for task and prod env • Test model against (simulated) “ground truth” or 2nd implementation [12] • Evaluate effects of floating point [12] • Model validation beyond accuracy [13] Pre-Production: Pay Down Tech Debt
  12. 12. 12 → Other tips — Code: • What is your production environment? • Set-up logging • Add else to if or try/except + error • DRY → refactor • Add regression tests • Comment liberally (explain “why”) + up-to-date [20] Pre-Production: Pay Down Tech Debt [14]
  13. 13. 13 Consulting Minimal time to add new feature Unsuitable features excluded Online Advertising* Fashion Test input features (typing, pytest) Finance Minimal time to add new feature Privacy built-in Test 2: ML Test Score … in practice: – Data and Feature Quality – Pre-Production: Pay Down Tech Debt – Model Development – Simulated ground truth Baseline model + 2nd implementation Rolling refresh Performance overall + those most likely to click Proxy + actual metrics Code review (PR) Hyperparameter optimization (sklearn.GridSearchCV) Baseline model Bias correction Proxy + actual metrics
  14. 14. 14 Consulting Loosely coupled fcns Central repo for clients Regression testing One-step build, prod Online Advertising* Streaming Fashion Loosely coupled fcns Streaming API (sanic) Integration test (pytest) One-step build, prod Finance Loosely coupled fcns Streaming One-step build, prod* Reproducibility of training Pre-Production: Pay Down Tech Debt – ML Monitoring – Logging Software + package versions check Data availability check Logging (logging) Evaluates empty + factual responses Local = prod env (virtualenv, Docker) Comp time (timeit) Missing data check Test 2: ML Test Score … in practice (cont’d): – ML Infrastructure –
  15. 15. 15 Test 2: ML Test Score … in practice (cont’d): Pre-Production: Pay Down Tech Debt
  16. 16. Step 5: Production 16
  17. 17. 17 Production: Deploy Code and Monitor Performance → One-button push to prod branch/repo → Model Rollout → Monitoring [15]
  18. 18. Step 6: Post-Production 18
  19. 19. 19 → Documentation + QA gets cut first → Debugging, debugging, debugging → code is never perfect → Bugfix vs. Feature → Post-mortem → Use the product → Support and Training Post-Production: Keep Code Up and Running [16]
  20. 20. 20 Post-Production: Align Business and Team Goals → Team targets: deadlines and revenue goals → Team competitions [17]
  21. 21. 21 Key Takeaways → Communication, tooling, logging, documentation, debugging → Automatically evaluate all components of ML pipeline → High model accuracy is not always the answer → Scope down, then scale up [18]
  22. 22. 22 You Did It! Code is in prod! Celebrate!
  23. 23. 23 You Did It! Code is in prod! Celebrate! … But not too hard. Tomorrow you start on v2.
  24. 24. Questions? 24
  25. 25. [ 1] http://www.projectcartoon.com/cartoon/1 [ 2] https://research.google.com/pubs/pub43146.html [ 3] https://www.linkedin.com/pulse/when-your-tech-debt-comes-due-kevin-scott [ 4] https://www.sec.gov/news/press-release/2013-222 [ 5] http://dilbert.com/strip/2017-01-03 [ 6] https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/ [ 7] https://solidgeargroup.com/wp-content/uploads/2016/07/tower_cheatsheet_white_EN_0.pdf [ 8] http://geek-and-poke.com/geekandpoke/2014/2/23/dev-cycle-friday-evening-edition [ 9] https://www.eecs.tufts.edu/~dsculley/papers/ml_test_score.pdf [10] http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf [11] https://www.slideshare.net/Tech_InMobi/building-machine-learning-pipelines [12] https://www.researchgate.net/publication/262569275_Testing_Scientific_Software_A_Systematic_Literature_Review [13] https://classeval.wordpress.com/introduction/basic-evaluation-measures/ [14] https://xkcd.com/1024/ [15] https://xkcd.com/1319/ [16] https://www.devrant.io/search?term=debugging [17] https://marketoonist.com/2015/03/hackathons.html [18] https://s-media-cache-ak0.pinimg.com/originals/9c/25/08/9c25082f5c4d3477124356e45673d426.png [19] https://www.pinterest.com/pin/177258935306351629/ [20] http://kellysutton.com/2017/09/01/comment-drift.html 25 References
  26. 26. 26 Appendix: Establish Business Use Case → Kick-off meeting with stakeholders: • Discuss use case, motivation and scope • Find out about format of deliverable and how it will be used by team • Brainstorm and discuss potential solutions • Iron-out deadlines, checkpoints and on-going support structure • Ask about prod env (if appropriate) • Scope down, then scale up • Close meeting with recap of action items Key Takeaways: communication + clear expectations [19]

×