Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Productionalizing Machine Learning Models: The Good, The Bad and The Ugly

269 views

Published on

Data science teams tend to put a lot of thought into developing a predictive model to address a business need, tackling data processing, model development, training, and validation. After validation, the model then tends to get rolled out -- without much testing -- into production. While software engineering best practices have been around for a long time until recently, no formal guidelines existed for checking the quality of code of a machine learning pipeline.

The talk will cover tips and best practices for writing more robust production-ready predictive model pipelines. We know that code is never perfect; Irina will also share the pains and lessons learned from experience productionalizing and maintaining 4 customer-facing models at 4 different companies: in online advertising, consulting, finance, and fashion.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Productionalizing Machine Learning Models: The Good, The Bad and The Ugly

  1. 1. Productionalizing Machine Learning Models: Good, Bad and Ugly Irina Kukuyeva, Ph.D. SoCal PyData April 26, 2018
  2. 2. 2 Fashion Consulting IoT Healthcare Media & Entertainment Finance CPG Retail Video Game Publisher Online Advertising Healthcare Ph.D. My Background
  3. 3. 3 Spectrum of (Select) Production Environments Love, love, love leather/faux leather jackets! Especially the blue! Positive sentiment Fashion Customer Retention Near-real time Consulting Revenue $3M+/year ASAP Online Advertising* Revenue $1M+/year Near-real time Finance Automation Daily
  4. 4. What ML Model Lifecycle Really Looks Like what customer described what data looked like what DS began to build what budget could buy what pivot looked like what code got reused what got tested [1] what got pushed to prod what got documented what customer wanted 4
  5. 5. Agenda what customer described what data looked like what DS began to build what budget could buy what pivot looked like what code got reused what got tested what got pushed to prod what got documented Step 1 (Appendix) what customer wanted Step 2: Data QA (Other Talks) Step 3: ML Development (Other Talks) Step 4: Pre-prod Step 5: Prod Step 5: Prod Step 1 (Appendix) 5 Step 4: Pre-prod [1]
  6. 6. Step 4: Pre-Production 6
  7. 7. 7 Pre-Production: Pay Down Tech Debt Technical Debt — Borrowing against future to trade-off between quality of code and speed of delivery now [2], [3] • Incur debt: write code, including ML pipelines [4], [21] • Pay down debt: extensively test and refactor pipeline end-to-end [5]
  8. 8. → Test 1: Joel’s Test: 12 Steps to Better Code [6] • Spec? • Source control? Best practices? [7] • One-step build? Daily builds? CI? • Bug database? Release schedule? QA? • Fix bugs before write new code? 8 Pre-Production: Pay Down Tech Debt [8]
  9. 9. 9 Consulting Daily stand-up Sprint planning Version control* Fix bugs first Bugs emailed/db Release on bugfix One-step build Atlassian suite Virtual machines Test 1: Joel’s Test: 12 Steps to Better Code … in practice: Pre-Production: Pay Down Tech Debt Fashion Weekly stand-up — Version control Fix bugs first — Release on bugfix One-step build PRD, Trello Virtual env, Docker, CircleCI, cookiecutter Online Advertising* Daily stand-up Sprint planning — — Bug database — — — — Finance — — Version control* Fix bugs first — — One-step build Trello —
  10. 10. 10 Pre-Production: Pay Down Tech Debt Test 1: Joel’s Test: 12 Steps to Better Code … in practice:
  11. 11. → Test 2: ML Test Score [9], [10] • Data and feature quality • Model development • ML infrastructure • Monitoring ML 11 Pre-Production: Pay Down Tech Debt [11]
  12. 12. 12 → Other tips — ML: • Choose simplest model, most appropriate for task and prod env • Test model against (simulated) “ground truth” or 2nd implementation [12] • Evaluate effects of floating point [12] • Model validation beyond AUC [13] Pre-Production: Pay Down Tech Debt
  13. 13. → Other tips — Code: • What is the production environment? • Set-up logging • Add else to if or try/except + error • DRY → refactor • Add regression tests • Comment liberally (explain “why”) + up-to-date [20] • Lint Pre-Production: Pay Down Tech Debt [14] 13
  14. 14. Consulting Minimal time to add new feature Unsuitable features excluded Test 2: ML Test Score … in practice: – Data and Feature Quality – Pre-Production: Pay Down Tech Debt – Model Development – Baseline model Bias correction Proxy + actual metrics Simulated ground truthBaseline model + 2nd implementation Rolling refresh Performance overall + those most likely to click Proxy + actual metrics Code review (PR) Hyperparameter optimization (sklearn.GridSearchCV) 14 Online Advertising* Fashion Test input features (typing, pytest) Finance Minimal time to add new feature Privacy built-in
  15. 15. Consulting Loosely coupled fcns Central repo for clients Regression testing One-step build, prod Pre-Production: Pay Down Tech Debt Missing data check Logging Software + package versions check Data availability check Logging (logging) Evaluates empty + factual responses Local = prod env (virtualenv, Docker) Comp time (timeit) 15 Online Advertising* Streaming Fashion Loosely coupled fcns Streaming API (sanic) Integration test (pytest) One-step build, prod Finance Loosely coupled fcns Streaming One-step build, prod* Reproducibility of training Test 2: ML Test Score … in practice: – ML Infrastructure – – ML Monitoring –
  16. 16. 16 Test 2: ML Test Score … in practice (cont’d): Pre-Production: Pay Down Tech Debt
  17. 17. Step 5: Production 17
  18. 18. 18 Production: Deploy Code and Monitor Performance → One-button push to prod branch/repo → Model Rollout → Monitoring [15]
  19. 19. Step 6: Post-Production 19
  20. 20. 20 → Documentation + QA → Debugging, debugging, debugging → Bugfix vs. Feature → Container Management → Post-mortem → Use the product → Support and Training Post-Production: Keep Code Up and Running [16]
  21. 21. 21 Post-Production: Align Business and Team Goals → Team targets: deadlines and revenue goals → Team competitions [17]
  22. 22. 22 Key Takeaways → Communication, tooling, logging, documentation, debugging → Automatically evaluate all components of ML pipeline → High model AUC is not always the answer → Scope down, then scale up [18]
  23. 23. 23 You Did It! Code is in prod! Celebrate!
  24. 24. 24 You Did It! Code is in prod! Celebrate! … But not too hard. Tomorrow you start on v2.
  25. 25. Questions? 25 https://goo.gl/DjkCBn
  26. 26. [ 1] http://www.projectcartoon.com/cartoon/1 [ 2] https://research.google.com/pubs/pub43146.html [ 3] https://www.linkedin.com/pulse/when-your-tech-debt-comes-due-kevin-scott [ 4] https://www.sec.gov/news/press-release/2013-222 [ 5] http://dilbert.com/strip/2017-01-03 [ 6] https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/ [ 7] https://solidgeargroup.com/wp-content/uploads/2016/07/tower_cheatsheet_white_EN_0.pdf [ 8] http://geek-and-poke.com/geekandpoke/2014/2/23/dev-cycle-friday-evening-edition [ 9] https://www.eecs.tufts.edu/~dsculley/papers/ml_test_score.pdf [10] http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf [11] https://www.slideshare.net/Tech_InMobi/building-machine-learning-pipelines [12] https://www.researchgate.net/publication/262569275_Testing_Scientific_Software_A_Systematic_Literature_Review [13] https://classeval.wordpress.com/introduction/basic-evaluation-measures/ [14] https://xkcd.com/1024/ [15] https://xkcd.com/1319/ [16] https://www.devrant.io/search?term=debugging [17] https://marketoonist.com/2015/03/hackathons.html [18] https://s-media-cache-ak0.pinimg.com/originals/9c/25/08/9c25082f5c4d3477124356e45673d426.png [19] https://www.pinterest.com/pin/177258935306351629/ [20] http://kellysutton.com/2017/09/01/comment-drift.html [21] http://www.safetyresearch.net/blog/articles/toyota-unintended-acceleration-and-big-bowl-%E2%80%9Cspaghetti%E2%80%9D-code 26 References
  27. 27. 27 Appendix: Establish Business Use Case → Kick-off meeting with stakeholders: • Discuss use case, motivation and scope • Brainstorm and discuss potential solutions • Format of deliverable? Prod env (if appropriate) • Iron-out deadlines, checkpoints and ongoing support structure • Scope down, then scale up • Close meeting with recap of action items Key Takeaways: communication + clear expectations [19]

×