Successfully reported this slideshow.

Improving How We Deliver Machine Learning Models (XCONF 2019)

1

Share

Loading in …3
×
1 of 44
1 of 44

Improving How We Deliver Machine Learning Models (XCONF 2019)

1

Share

Download to read offline

In this talk, we share some better ways of working that help us with some common challenges faced in a ML project.

Repos:
1. https://github.com/ThoughtWorksInc/ml-app-template
2. https://github.com/ThoughtWorksInc/ml-cd-starter-kit

Demo videos:
1. Dockerised setup https://www.youtube.com/watch?v=S6kWaXQ530k
2. Installing cross-cutting services (e.g. GoCD, MLFlow, EFK): https://www.youtube.com/watch?v=p8jKTlcpnks
3. Rolling back harmful models: https://www.youtube.com/watch?v=rNfrgaRTz7c

In this talk, we share some better ways of working that help us with some common challenges faced in a ML project.

Repos:
1. https://github.com/ThoughtWorksInc/ml-app-template
2. https://github.com/ThoughtWorksInc/ml-cd-starter-kit

Demo videos:
1. Dockerised setup https://www.youtube.com/watch?v=S6kWaXQ530k
2. Installing cross-cutting services (e.g. GoCD, MLFlow, EFK): https://www.youtube.com/watch?v=p8jKTlcpnks
3. Rolling back harmful models: https://www.youtube.com/watch?v=rNfrgaRTz7c

More Related Content

Similar to Improving How We Deliver Machine Learning Models (XCONF 2019)

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Improving How We Deliver Machine Learning Models (XCONF 2019)

  1. 1. 23 April 2019 IMPROVING HOW WE DELIVER MACHINE LEARNING MODELS
  2. 2. 2 WHO ARE WE? David Tan Developer @ ThoughtWorks Jonathan Heng Developer @ ThoughtWorks
  3. 3. 3 THE PLAN TODAY Why do we need to improve the ML workflow? What are some better practices? How can we practice these practices?
  4. 4. 4 TEMPERATURE CHECK Who has... ● Trained a ML model before? ● Deployed a ML model for fun? ● Deployed a ML model at work? ● Deployed a model using an automated CI pipeline?
  5. 5. WHAT’S THE PROBLEM?
  6. 6. 6 OBSERVATION One of these is not like the others Continuous delivery practices can help us change this.
  7. 7. 7 JOURNEY ON AN ML PROJECT Jupyter Notebooks Production ???
  8. 8. 8 IT’S NEVER JUST ML We got 99 problems and machine learning ain’t one Potentially unethical outcomes How can we help people do difficult things? Source: Machine Learning: The High Interest Credit Card of Technical Debt (Google, 2015)
  9. 9. 9 HELPING PEOPLE DO DIFFICULT THINGS Sensible defaults ● Two reference repos: ○ github.com/ThoughtWorksInc/ml-cd-starter-kit ○ github.com/ThoughtWorksInc/ml-app-template ● Better ways of working ○ 6 common problems and suggested solutions
  10. 10. 10 LET’S GO 6 common problems and suggested solutions
  11. 11. 1WORKS ON MY MACHINE 11
  12. 12. 1212 1. WORKS ON MY MACHINE To start, simply: • docker build … • docker run ...
  13. 13. 1313 1. WORKS ON MY MACHINE Demo
  14. 14. 2NO DATA / DATA SUCKS
  15. 15. Business problem ML model Feature engineering Data 1515 2. NO DATA / DATA SUCKS Mitigation measures ● Think about data access before starting project ● Collect better data with every release (more on this a few slides from now) ● “Wizard of Oz” / Fake it till we make it ○ Provide interface but without ML implementation
  16. 16. 16 3DEPLOYMENTS ARE COMPLICATED
  17. 17. 1717 3. DEPLOYMENTS ARE COMPLICATED Mitigation measures ● CI pipeline ● Deploy early and often ● Tracer bullet ● Bring the pain forward
  18. 18. 1818 3. DEPLOY EARLY AND OFTEN Source: Continuous Delivery (Jez Humble, Dave Farley) feedback Run unit tests push Source code repository trigger Local env
  19. 19. 1919 3. DEPLOY EARLY AND OFTEN feedback Run unit tests Train and evaluate model push Source code repository trigger Local env Source: Continuous Delivery (Jez Humble, Dave Farley)
  20. 20. 2020 3. DEPLOY EARLY AND OFTEN feedback Run unit tests Deploy candidate model to staging Train and evaluate model push Source code repository trigger Local env Artifact repositor y Source: Continuous Delivery (Jez Humble, Dave Farley)
  21. 21. 2121 3. DEPLOY EARLY AND OFTEN Run unit tests Deploy candidate model to staging Deploy model to production Train and evaluate model push Source code repository trigger Local env Artifact repositor y Source: Continuous Delivery (Jez Humble, Dave Farley)
  22. 22. 22 4HOW DO WE CHOOSE BETWEEN CANDIDATE MODELS?
  23. 23. 2323 4. CHOOSING BUILDS Include model evaluation metrics in CI pipeline
  24. 24. 24 5HOW’S THE MODEL DOING IN THE WILD?
  25. 25. 2525 5. OBSERVE! Monitoring service usage Benefit #1: Feedback on production model
  26. 26. 2626 5. OBSERVE! Monitoring model output Benefit #1: Feedback on production model
  27. 27. 2727 5. OBSERVE! Monitoring model inputs ● Could help identify training-serving skew Benefit #1: Feedback on production model
  28. 28. 2828 5. OBSERVE! Benefit #2: Interpretability of predictions
  29. 29. 2929 5. OBSERVE! Benefit #3: Closing the data collection loop Data turking Train & test model Deploy model Data / feature repository data Evaluate models Flow of data Flow of model Model Service Logs
  30. 30. 30 5. OBSERVE! Benefit #4: Ability to measure goodness of any model build_and_ test deploy_ staging deploy_ prod evaluate_ model_w_new_data (git push) evaluate_ model model = my-image:$BUILD_ID r_2 = 0.7 rmse = 42
  31. 31. 3131 5. OBSERVE! Benefit #4: Ability to measure goodness of any model
  32. 32. 3232 5. HOW’S THE MODEL IN THE WILD? OBSERVE! Summing up ● Mitigation measures ○ Logging + Monitoring ● Benefits ○ Feedback on production models ○ Interpretability (how did the model decide on this particular prediction?) ○ Better data for training ○ Better (unseen) data for evaluating candidate/champion models
  33. 33. 3333 5. HOW’S THE MODEL IN THE WILD? Demo GoCD MLFlow Kubernetes + Helm ElasticSearch Fluentd Kibana Grafana
  34. 34. 34 5. HOW’S THE MODEL IN THE WILD? Demo
  35. 35. 35 6HARMFUL MODELS IN PRODUCTION
  36. 36. 3636 6. HARMFUL MODELS IN PRODUCTION ● PredPol algorithm reinforces racial biases in policing data ● Recruiting tool shows bias against women Actual news headlines Image source: I’m an AI researcher, and here’s what scares me about AI (Rachel Thomas)
  37. 37. 3737 ● Discuss and define what “bad” looks like in our context ● “Black mirror” retros ● Measure unfairness ○ Make fairness a measurable fitness function ● Data ethics checklist (link) ● Human-in-the-loop / appeal processes ● Ability to recover from harmful models 37 6. HARMFUL MODELS IN PRODUCTION Mitigation measures
  38. 38. 38 6. HARMFUL MODELS IN PRODUCTION Demo: rollback to last good build
  39. 39. 39 SUMMING UP How can we make easier to do the right thing?
  40. 40. 40 MAKE IT EASIER TO DO THE RIGHT THING ● Better ways of working ○ Environment management ○ Closing the data collection loop ○ Deploy early and often ○ Automated tracking of hyperparameters and metrics ○ Logging and monitoring ○ Do no harm ● Two reference repos: ○ github.com/ThoughtWorksInc/ml-cd-starter-kit ○ github.com/ThoughtWorksInc/ml-app-template
  41. 41. 4141 Provision and configure cross-cutting services GoCD EFKG MLFlow github.com/ThoughtWorksInc/ml-cd-starter-kit github.com/ThoughtWorksInc/ml-app-template Project boilerplate template Unit tests Train model Test model metrics Dockerised setup Store CI pipeline as code Track hyperparameters and metrics of each training run on CI Logging (predictions, inputs, explanatory variables)
  42. 42. 424242 SUMMING UP Notebook / playgroun d PROD (maybe ) commit and push Experiment / Develop Monitor Deploy Test Continuous Delivery
  43. 43. 43 FURTHER READING ● https://www.thoughtworks.com/intelligent-empowerment ● www.continuousdelivery.com
  44. 44. David Tan / Jonathan Heng davidtan+jonheng@thoughtworks.co m THANK YOU. 44

×