Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fri benghiat gil-odsc-data-kitchen-data science to dataops

1,469 views

Published on

DataOps

Published in: Data & Analytics
  • Be the first to comment

Fri benghiat gil-odsc-data-kitchen-data science to dataops

  1. 1. Copyright © 2017 by DataKitchen, Inc. All Rights Reserved.
  2. 2. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Agenda How to go from Data Science to Data Operations (#DataOps) Introductions Data Science Challenges What is DataOps? Seven Shocking Steps to DataOps Pulling it together
  3. 3. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Keep this question in mind What can I take from this session and use on Monday?
  4. 4. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. For slides contact gil@DataKitchen.io
  5. 5. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Speaker – co-Founder of DataKitchen Gil Benghiat, Founder, VP of Products gil@datakitchen.io A series of data centric software projects 🎓 Applied Math / Biology @ Brown 🎓 Computer Science @ Stanford 🏢 Bell Labs, Sybase, PhaseForward, LeapFrogRx
  6. 6. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. DataKitchen DataOps Software Platform Main Features 1. Orchestrate complex data pipelines 2. Deploy new ideas to production 3. Automate tests and monitor quality Enables 1. Fast delivery of analytics 2. High data quality 3. Using your favorite tools and data stores
  7. 7. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Agenda Introductions • Data Science Challenges What is DataOps? Seven Shocking Steps to DataOps Pulling it together
  8. 8. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex. Google Advances in Neural Information Processing Systems 28 (NIPS 2015)
  9. 9. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Business Need Prep Data Feature Extraction Build Model Evaluate Model Deploy Model Monitor Model Iterate, Test and Improve Model building
  10. 10. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Agenda Introductions Data Science Challenges • What is DataOps? Seven Shocking Steps to DataOps Pulling it together
  11. 11. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Genesis of DataOps People, Process, Organization Technical Environment = 7 steps
  12. 12. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Data Engineer Data Scientist Data Analyst Agile Development is a mindset: 1. Collaborate with your customers 2. Respond to change 3. Measure progress by working analytics 4. Release frequently (most important first) 5. Get feedback on your releases 6. Adjust your behavior to become more effective 4 Values 12 Principles Be Pragmatic Not Dogmatic DataOps: It Began With Agile Business Partner
  13. 13. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Focus on Value
  14. 14. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Agenda Introductions Data Science Challenges What is DataOps? • Seven Shocking Steps to DataOps Pulling it together
  15. 15. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Seven Steps to DataOps 1. Orchestrate Two Journeys 2. Add Tests 3. Use a Version Control System 4. Branch and Merge 5. Use Multiple Environments 6. Reuse & Containerize 7. Parameterize Your Processing
  16. 16. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Journey 1: Orchestrate data to customer value Analytic process are like manufacturing: materials (data) and production outputs (refined data, charts, graphs, models) Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online ❶
  17. 17. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Journey 2: Speed ideas to production Analytic processes are like software development: deliverables continually move from development to production ❶ Data Engineers Data Scientists Data Analysts Diverse Team Diverse Tools Diverse Customers Business Customer Products & Systems
  18. 18. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Innovation and Value Pipeline Together Focus on both orchestration and deployment while automating & monitoring quality Don’t want break production when I deploy my changes Don’t want to learn about data quality issues from my customers ❶
  19. 19. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Add Tests Monitor quality Data Quality Monitoring: To ensure that during in the Value Pipeline, the data quality remains high. Code Quality Monitoring: Before promoting work, running new and old tests gives high confidence that the change did not break anything in the Innovation Pipeline ❷
  20. 20. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Automate Monitoring & Tests In Production Test Every Step And Every Tool in Your Value Pipeline Are your outputs consistent? And Save Test Results! Are data inputs free from issues? Is your business logic still correct? Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online ❷
  21. 21. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Support Multiple Types Of Tests Testing Data Is Not Just Pass/Fail in Your Value Pipeline Support Test Types • Error – stop the line • Warning – investigate later • Info – list of changes Keep Test History • Statistical Process Control ❷
  22. 22. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Types of Tests ❷
  23. 23. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Example Tests Simple ❷
  24. 24. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. For the Innovation Pipeline Tests Are For Also Code: Keep Data Fixed Deploy Feature Run all tests here before promoting ❷
  25. 25. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Use a Version Control System At The End Of The Day, Analytic Work Is All Just Code Access: Python Code Transform: SQL Code, ETL Code Model: R Code Visualize: Tableau Workbook XML Report: Tableau Online Source Code Control ❸
  26. 26. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Branch & Merge Source Code Control Branching & Merging enables people to safely work on their own tasks ❹
  27. 27. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. ❹ Example branch and merge pattern Sprint 1 Sprint 2 f1 f2 f3 main / master / trunk f5
  28. 28. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Access: Python Code Transform: SQL Code, ETL Code Model: R Code Visualize: Tableau Workbook XML Report: Tableau Online Use Multiple Environments Analytic Environment Your Analytic Work Requires Coordinating Tools And Hardware ❺
  29. 29. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Use Multiple Environments Provide an Analytic Environment for each branch • Analysts and Data Scientists need a controlled environment for their experiments • Engineers need a place to develop outside of production • Update Production only after all tests are run! ❺
  30. 30. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Use Multiple Environments ❺ Provide an Analytic Environment for each branch • Analysts and Data Scientists need a controlled environment for their experiments • Engineers need a place to develop outside of production • Update Production only after all tests are run!
  31. 31. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Reuse & Containerize Containerize 1. Manage the environment for each model (e.g. Docker, VM, AMI) 2. Practice Environment Version Control make production and development areas identical Reuse 1. The code 2. Data ❻
  32. 32. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Parameterize Your Processing Think Of Your Pipeline Like A Big Function • Named sets of parameters will increase your velocity • With parameters, you can vary • Inputs • Outputs • Steps in the workflow • You can make a time machine ❼
  33. 33. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Agenda Introductions Data Science Challenges What is DataOps? Seven Shocking Steps to DataOps • Pulling it together
  34. 34. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Business Need Prep Data Feature Extraction Build Model Evaluate Model Deploy Model Monitor Model Iterate, Test and Improve Model building
  35. 35. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. The 7 Steps and Data Science Journeys Tests Version Control Branch and Merge Environments Reuse / Containerize Parameterize Business Need Agile Prep Data x x x x x x x Feature Extraction x x x x x x x Build Model x x x x x x x Evaluate Model x Deploy Model x x x x x x x Monitor Model x
  36. 36. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. Make a note to yourself What can I take from this session and use on Monday?
  37. 37. Copyright © 2018 by DataKitchen, Inc. All Rights Reserved. For slides contact gil@DataKitchen.io Thank you for attending

×