Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks)

2,024 views

Published on

In this talk, Lex will walk through some of the emerging best practices for Machine Learning engineering and look at how they compare to those of traditional software development. He will be covering topics including Product Management; Research and Development; Deployment; QA and Lifecycle Management of Machine Learning projects.

Published in: Technology
  • Be the first to comment

Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks)

  1. 1. EMERGING BEST PRACTICES FOR MACHINE LEARNING ENGINEERING Lex Toumbourou
  2. 2. 2 Ready to make your mark? /careers ©ThoughtWorks 2019 Commercial in Confidence
  3. 3. 3 ©ThoughtWorks 2019 Commercial in Confidence Emerging Best Practises for Machine Learning Engineering Lex Toumbourou
  4. 4. ©ThoughtWorks 2019 Commercial in Confidence 4 Why this talk? ● Aimed at individuals and organisations getting started with Machine Learning. ● Reduce uncertainty, cost and time to deliver. ● Based on my own experience, my colleagues and various other authors. Photo by Ken Treloar on Unsplash
  5. 5. ©ThoughtWorks 2019 Commercial in Confidence 5 Talk overview ● Review of best practises or “sensible defaults” in software projects. ● Consider challenges that ML projects introduce. ● Practises by ML projects phases. Photo by Casey Horner on Unsplash
  6. 6. ©ThoughtWorks 2019 Commercial in Confidence 6 Motivating example ● Predict how long a pet will take to be adopted based on its profile. ● Combination of structured, NLP and image data. ● Generates a continual supply of training data. Petfinder.my Adoption Prediction https://www.kaggle.com/c/petfinder-adoption-prediction
  7. 7. ©ThoughtWorks 2019 Commercial in Confidence 7 Terminology ● Supervised learning - process of learning a predictive function from training a dataset of input / output pairs. ● Model - another name for the learned function and its parameters. ● Features - another word for inputs of model (sometimes engineered). ● Training set - whole collection of feature -> output pairs used to train model. ● Validation set - set of training data set aside to tune model. ● Test set - set of data used to evaluate model. Review of terminology used throughout talk
  8. 8. ©ThoughtWorks 2019 Commercial in Confidence Part 1. Software engineering “best practises”
  9. 9. ©ThoughtWorks 2019 Commercial in Confidence 9 Waterfall ● Based on the assumption that sufficient upfront planning would save time and money from rework later in project ● Slow feedback loop ● Doesn’t account for unforeseeable complexity
  10. 10. ©ThoughtWorks 2019 Commercial in Confidence 10 Iterative/Agile development ● Work in cycles (“sprints”) of requirements, design, code and, release. ● Rapid Application Development (RAD), Rational Unified Process, XP Programming, Scrum, Kanban etc. https://blog.itil.org/2014/08/allgemein/what-it- service-management-can-learn-from-the-agile- manifesto-and-vice-versa/
  11. 11. ©ThoughtWorks 2019 Commercial in Confidence 11 Modern software excellence ● Continuous delivery. ● Fast feedback. ● Rigorous testing. ● Continuous integration. ● Sophisticated version control. ● Infrastructure as code (DevOps). From Continuous Deliver in a Nutshell by Zaiku
  12. 12. ©ThoughtWorks 2019 Commercial in Confidence Part 2: Challenges of ML projects
  13. 13. ©ThoughtWorks 2019 Commercial in Confidence 13 Uncertain of outcomes ● Paradigm shift for product managers. ● Is this even a problem I can effectively solve with Machine Learning? Photo by Miguel Bruna on Unsplash
  14. 14. ©ThoughtWorks 2019 Commercial in Confidence 14 Training data requirements ● Unstructured problems: collecting big datasets (used to be) a big barrier to entry. ● Structured datasets in the wild are often spread across multiple sources with different governance policies Source unknown
  15. 15. ©ThoughtWorks 2019 Commercial in Confidence 15 Reproducibility requirements ● State must be consistent to allow experiments to build upon each other. ● Large datasets and artifacts don’t fit into traditional version control tools Photo by 85Fifteen on Unsplash
  16. 16. ©ThoughtWorks 2019 Commercial in Confidence 16 Slow feedback ● Large models can take from hours to days to train. ● Models can be fiddly and difficult to train. Photo by Nick Abrams on Unsplash
  17. 17. ©ThoughtWorks 2019 Commercial in Confidence 17 Model drift ● Models trained to make predictions on today’s data have no guarantees they will work on future data. Photo by Josh Yang ∙ White. ∙ . on Unsplash
  18. 18. ©ThoughtWorks 2019 Commercial in Confidence 18 Blackbox-ness ● Hard to assess “correctness”. ● Production results may differ from dev results. ● New class of concerns for QA and support. Photo by Emily Morter on Unsplash
  19. 19. ©ThoughtWorks 2019 Commercial in Confidence Part 3: Phases of ML projects
  20. 20. ©ThoughtWorks 2019 Commercial in Confidence 20 ML project overview Plan Collect PrepareTrain Deploy
  21. 21. ©ThoughtWorks 2019 Commercial in Confidence 21 Project-wide practises Photo by Hunter Haley on Unsplash
  22. 22. ©ThoughtWorks 2019 Commercial in Confidence 22 Focus on product not tech ● Can you validate it without training any models? ● Is there a open-source or vendor solution that will get you close? ● Tip: if you use a vendor solution, you still need to evaluate its performance with a test set Project-wide practises Photo by Nicolas Hoizey on Unsplash
  23. 23. ©ThoughtWorks 2019 Commercial in Confidence 23 Fast cycle time ● Start small and increase complexity as needed. ● “If you're not embarrassed by the first version of your product (model), you've launched too late” - Reid Hoffman Project-wide practises Photo by Fabian Bächli on Unsplash
  24. 24. ©ThoughtWorks 2019 Commercial in Confidence 24 Consistent code structure ● Document where to put things and create a linter-enforce style guide. http://flake8.pycqa.org/en/latest/ ● Cookie cutter data science to reduce “bike-shedding” and decision fatigue https://github.com/drivendata/cooki ecutter-data-science Project-wide practises Photo by Dan Ritson on Unsplash
  25. 25. ©ThoughtWorks 2019 Commercial in Confidence 25 1. Plan 2. Collect 3. Prepare 4. Train 5. Deploy Plan Photo by NORTHFOLK on Unsplash
  26. 26. ©ThoughtWorks 2019 Commercial in Confidence 26 Consider implications ● What happens if the model is bad? ● What are the implications of what I’m optimising? ● Do we need a human in the loop? Plan https://www.slideshare.net/ThoughtWorks/social-implications-of-bias-in- machine-learning-fiona-coath-by-thoughtworks-133798261
  27. 27. ©ThoughtWorks 2019 Commercial in Confidence 27 Pick an evaluation metric ● “Main” (even single) evaluation metric based on after considering your problem and data understanding. https://www.coursera.org/lecture/machine-learning- projects/single-number-evaluation-metric-wIKkC ● Baseline metric predicting at random or majority class predictions. Plan https://www.biochemia- medica.com/en/journal/22/3/10.11613/BM.2012.031
  28. 28. ©ThoughtWorks 2019 Commercial in Confidence 28 Plan test set ● Test set should be production data. ● Test set shouldn’t overlap with training set. ● Newer data is (usually) most important. Plan
  29. 29. ©ThoughtWorks 2019 Commercial in Confidence 29 Determine run criteria ● How will our production infrastructure constrain our model? ● How fast does the inference need to be? Plan Photo by NORTHFOLK on Unsplash
  30. 30. ©ThoughtWorks 2019 Commercial in Confidence 30 Collect 1. Plan 2. Collect 3. Prepare 4. Train 5. Deploy Photo by Phad Pichetbovornkul on Unsplash
  31. 31. ©ThoughtWorks 2019 Commercial in Confidence 31 Data scientist builds dataset ● If you are building models, you should have a good understanding of how the dataset was collected. ● Active learning can make this fast. https://platform.ai/ https://prodi.gy Collect Building labelled dataset with platform.ai
  32. 32. ©ThoughtWorks 2019 Commercial in Confidence 32 Small data first ● Small datasets can (sometimes) go a long way. ● Transfer learning for image classification, natural language processing and even structured data. Collect Photo by Ayo Ogunseinde on Unsplash
  33. 33. ©ThoughtWorks 2019 Commercial in Confidence 33 More data > solution complexity ● “Most people overestimate the cost associated with gathering and labeling data, and underestimate the hardship of solving problems in a data starved environment.” - Emmanuel Ameisen https://blog.insightdatascience.com/ho w-to-deliver-on-machine-learning- projects-c8d82ce642b0 Collect Photo by Simon Maage on Unsplash
  34. 34. ©ThoughtWorks 2019 Commercial in Confidence 34 Share collected data ● Package and share collected datasets. https://dvc.org https://quiltdata.com ● Encourage centralised & compliant storage (data lakes). Collect https://quiltdata.com/
  35. 35. ©ThoughtWorks 2019 Commercial in Confidence 35 Prepare 1. Plan 2. Collect 3. Prepare 4. Train 5. Deploy
  36. 36. ©ThoughtWorks 2019 Commercial in Confidence 36 Look at your data ● Look at random examples. ● Histograms. ● Missingno for missing number visualizations. https://github.com/ResidentMario /missingno Prepare
  37. 37. ©ThoughtWorks 2019 Commercial in Confidence 37 ML-driven exploratory analysis (EDA) ● Aim to train a model fast then use interpretability and SME knowledge to guide feature engineering and data collection. From Fast.ai’s Machine Learning for Coders ● GBM (XGBoost, LightGBM, Catboost) software can handle missing values, categorical values and varying scales out the box. Prepare
  38. 38. ©ThoughtWorks 2019 Commercial in Confidence 38 Version artifacts and pipelines ● Version control artifacts. ● Track the pipelines used to generate features. https://dvc.org ● Pipenv & Poetry for tracking dependencies chains. https://pipenv.readthedocs.io/en/latest/ https://github.com/sdispater/poetry Prepare
  39. 39. ©ThoughtWorks 2019 Commercial in Confidence 39 Practise good code hygiene ● Test-driven development for feature engineering code: unit, integration, etc ● Refactor into modules. ● Fix bugs with your features before worrying about hyperparameters. Prepare Photo by Piron Guillaume on Unsplash
  40. 40. ©ThoughtWorks 2019 Commercial in Confidence 40 Train 1. Plan 2. Collect 3. Prepare 4. Train 5. Deploy Photo by Fancycrave on Unsplash
  41. 41. ©ThoughtWorks 2019 Commercial in Confidence 41 “Easiest” models first ● Favour simple, interpretable models initially. ● GBMs are great default choice for structured data. Train https://towardsdatascience.com/interpretable-machine-learning- with-xgboost-9ec80d148d27
  42. 42. ©ThoughtWorks 2019 Commercial in Confidence 42 Fast feedback ● Overfit first. ● Train on samples or small images etc while testing experiments: Aim to keep training time < 5 minutes. ● Val set from the same distribution as test set Train https://www.bridgewateruk.com/2016/08/working-large-company-vs-working- small-company-pros-cons/
  43. 43. ©ThoughtWorks 2019 Commercial in Confidence 43 Transfer learning ● (Almost) always start with a pretrained model if possible. ● Transfer learning for image classification and recently natural language processing. Universal Language Model Fine-tuning for Text Classification BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Train https://machinelearningmastery.com/transfer-learning-for-deep-learning/
  44. 44. ©ThoughtWorks 2019 Commercial in Confidence 44 Constrain training to run criteria ● Constrain model selection to suit run criteria. ● CatBoost, as alternative to XGBoost, support fast inference and model size regularization Train
  45. 45. ©ThoughtWorks 2019 Commercial in Confidence 45 Perform error analysis ● Error analysis by hand: look at 100 examples of errors and determine common themes. ● View most confidence and least confident predicts. ● Feature importance and ablation. Train From https://www.kdnuggets.com/2018/01/error-analysis-your-rescue.html based on ideas by Andrew Ng
  46. 46. ©ThoughtWorks 2019 Commercial in Confidence 46 Deploy 1. Plan 2. Collect 3. Prepare 4. Train 5. Deploy Photo by Agto Nugroho on Unsplash
  47. 47. ©ThoughtWorks 2019 Commercial in Confidence 47 Go to prod early ● Test your model on data and conditions in prod early. ● A/B deployments: new model receives inputs alongside production model to compare performance. Deploy Model A Model B
  48. 48. ©ThoughtWorks 2019 Commercial in Confidence 48 Validate inputs (and outputs) ● Fast feedback on prod data not accounted for in training / test set. ● Pydantic validates using Python types. https://github.com/samuelcolvin/pydanti c Deploy Photo by Fancycrave on Unsplash
  49. 49. ©ThoughtWorks 2019 Commercial in Confidence 49 Minimise ops ● Aim for Serverless and low infrastructure. ● Automate deployments. ● Developers and data scientists on call. Deploy
  50. 50. ©ThoughtWorks 2019 Commercial in Confidence 50 Monitor metric ● Monitor metric by continually building new test sets. ● Track performance over time. ● Schedule retraining. Deploy Photo by Kyle Hanson on Unsplash
  51. 51. ©ThoughtWorks 2019 Commercial in Confidence 51 Accessible interpretability tools ● Data scientist should create tools to make model accessible to all. ● Interpretability dashboards to make predictions against real data and view interpretations. https://www.thoughtworks.com/clients/ark ose-labs Deploy "Why Should I Trust You?": Explaining the Predictions of Any Classifier A Unified Approach to Interpreting Model Predictions
  52. 52. ©ThoughtWorks 2019 Commercial in Confidence Conclusion ● Research is uncertain but we can define clear goals. ● Data can be collected iteratively. ● Carefully track data, artifacts and pipelines for reproducibility. ● Aim for fast feedback while training models. ● Deploy early and monitor production. ● Make interpretability tools accessible to the organisation.
  53. 53. ©ThoughtWorks 2019 Commercial in Confidence Thank you 53 Lex Toumbourou lext@thoughtworks.com @lexandstuff
  54. 54. 54 Questions?

×