Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Taming the reproducibility crisis

206 views

Published on

In data science, the scientific part is often forgotten - workflows, tools, and practices that are popular tend to yield experiments that cannot be repeated. Experiments are not reliable cannot tell us whether changes improve products or not. What works fine during initial development is inadequate for sustainable development of machine learning products. In this presentation, you will learn:
- Why reproducibility matters for data science.
- The practices and workflows that cause reproducibility problems.
- How to build technical environments and processes that enable reproducibility and iterative development of machine learning products.

Published in: Data & Analytics
  • Be the first to comment

Taming the reproducibility crisis

  1. 1. www.scling.com Taming the reproducibility crisis Nordic data science and machine learning summit 2019-10-16 Lars Albertsson, founder @ Scling 1
  2. 2. www.scling.com A typical data science journey, phase one ● Data scientists in a corner ○ Weak engineering support ○ Weak product connection ● Manually exported data ● Single version of datasets ○ Good reproducibility ● Reuse data until overfit 2 The lab
  3. 3. www.scling.com Tell us a story ● Great tool for ○ Displaying ○ Data story telling ○ Personal playground ● Less suitable for ○ Scientific experimentation ○ Collaboration ○ Production ● Hidden state, out-of-order execution, difficult to reuse, weak IDE, no lint, no modularity, scaling, … ○ Joel Grus: “I don’t like notebooks”, https://youtu.be/7jiPeIFXb6U 3
  4. 4. www.scling.com A typical data science journey, next step 4 The lab Data lake Flowing experiment
  5. 5. www.scling.com A typical data science journey, phase two ● Data scientists + data engineers ○ Pipelines, fresh data ● Historical data. Whoosh! ● Train on all data until now ○ Changes every day ● Evaluate model on new data ○ Avoid manual overfit. Swell! ○ Changes every day 5 Data lake Flowing experiment
  6. 6. www.scling.com Changing data, volatile workflows ● Data scientists + data engineers ○ Pipelines, fresh data ● Historical data. Whee! ● Train on all data until now ○ Changes every day ● Evaluate model on new data ○ Avoid manual overfit. Swell! ○ Changes every day 6
  7. 7. www.scling.com Data unscience 7
  8. 8. www.scling.com Data unengineering 8
  9. 9. www.scling.com Data unengineering 9 How to I test the pipeline? You temporarily change the output path and run manually. See if the output looks good. Don’t do that. What if I forget to change path?
  10. 10. www.scling.com Typical data science journey, phase three 10 The lab Flowing experiment Data lake Flowing experiment Integrated iterative Data lake
  11. 11. www.scling.com Reproducibility starts to matter ● Initially large strides ● Diminishing returns → Precision measurement → Reproducible experiment ● Tame reproducibility or slow innovation 11 Integrated iterative Data lake
  12. 12. www.scling.com Reproducibility, technical view ● Train on known data ○ Batch only, never streaming ○ Explicitly enumerated datasets ○ No live sources ○ Not “all data that we have” or “latest state” ● Mastering workflow orchestration is key ● Lineage & provenance will become focus ○ Current tooling inadequate ● Random == not reproducible ○ But necessary for training 12
  13. 13. www.scling.com Heathen data science - 2019 13 Please send me a copy of the latest model. 20 steps from model to production. 6 months to production. Different data science / development / QA teams. Which data were used to build this model? Bash script for building model from data. No feedback loop from operations to data scientists. Which hyperparameters should we use? The model can only be applied in this environment. Code represented as notebooks.
  14. 14. www.scling.com Heathen software engineering - 1999 14 Please send me a copy of the latest source code. 20 steps from source to production. Different development / QA / operations teams. Which files were used to build this version? Bash script for building artifacts from source. No feedback loop from operations to development. Which compiler flags should we use? 6 month release cycle. The build only works on this machine. Code represented as UML models.
  15. 15. www.scling.com Two decades of software engineering later 15 Team formation along value streams - DevOps. Everything as configuration (or code), in version control. Immutable artifacts, (hermetically) rebuilt from source. Continuous delivery (and continuous integration) with swift quality feedback.
  16. 16. www.scling.com Development Value stream team formation - DevOps 16 QA Operations Value stream Product A Product B Product C
  17. 17. www.scling.com Data science Value stream team formation - DataOps 17 Data engineering Operations Value stream Product A Product B Product C
  18. 18. www.scling.com Immutable artifacts from source 18 Test Deploy ELF WAR Container image Source code Build system CI / CD pipeline ● Nobody wants to work without ○ But some still do ● Strong workflow from source ○ Not yet hermetic ● Immutable code artifacts
  19. 19. www.scling.com Immutable models from raw data 19 Eval Deploy Container image + stored model Cold store data Workflow orchestration Data pipeline ● Nobody wants to work without ○ Most are not aware ● Strong workflow from raw data ○ Hermetic? ● Immutable data artifacts ● Key component: workflow orchestrator ● Train in batch ○ Reproducible ○ Infer in batch/stream/online
  20. 20. www.scling.com Everything as config (or code) ● Business logic ● Test code + test data ● Application configuration ● Deployment & dev tooling ● Infrastructure ● Monitoring, alert, other ops Some in weak languages (YAML, HCL). Prefer code. 20
  21. 21. www.scling.com Size = effort Credits: “Hidden Technical Debt in Colour = code complexity Machine Learning Systems”, Google, NIPS 2015 Everything as config (or code) ● Model code ● Test code + test data ○ Fuzzy testing - solved problem ● Hyperparameters ● Deployment & dev tooling ● Infrastructure ● Monitoring, alert, other ops 21 Configuration Data collection Monitoring Serving infrastructure Feature extraction Process management tools Analysis tools Machine resource management Data verification ML
  22. 22. www.scling.com Continuous delivery (+ CI) with swift feedback ● Short time from code to production feedback ● There is no tradeoff speed vs reliability 22 Integrated iterative
  23. 23. www.scling.com Swift feedback for machine learning ● Siloed: 6+ months Cultural work ● Autonomous: 1 month Technical work, reproducibility ● Coordinated: days 23 Integrated iterative Data lake
  24. 24. www.scling.com Skip to phase three 24 The lab Flowing experiment Data lake Flowing experiment Integrated iterative Data lake
  25. 25. www.scling.com Mix data scientists with developers, QA, ops 25 ● Expect conflicts in work methods ● Facilitate mutual learning ● Limit scope of weak tools & workflows ○ But don’t force clunky tools on data scientists Reproducibility is a technical problem with a human solution Mark Coleman: Inextricably Linked: Reproducibility & Productivity in Data Science & AI https://youtu.be/eORATxPx1Bw Product
  26. 26. www.scling.com Who’s talking? Lars Albertsson, @lalleal Ex: Google, Spotify, Schibsted, freelance Founder: Scling - data-value-as-a-service ● Siloed: 6+ months ● Autonomous: 1 month ● Coordinated: days 26 Integrated Iterative Data lake Integrated iterative

×