Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
www.scling.com
The lean principles of
DataOps
Berlin Buzzwords, 2020-06-08
Lars Albertsson, Founder, Scling
Christopher Be...
www.scling.com
Scling - data-value-as-a-service
2
Data lake
Stream storage
● Extract value from your data
● Data platform ...
www.scling.com
1994: OS/2 Warp CID installation
3
Grmbl, who
reinstalled my
machine?
www.scling.com
IT craft to factory
4
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrast...
www.scling.com
Security Waterfall
Data factories
5
Application
delivery
Traditional
operations
Traditional
QA
Infrastructu...
www.scling.com
The Toyota Way
Selected lean principles:
● Long-term over short-term
● The right process will produce the r...
www.scling.com
Common waste species
● Cognitive waste
● Delivery waste
● Operational waste
● Product waste
7
www.scling.com
Cognitive waste
● Why do we have 25 time formats?
○ ISO 8601, UTC assumed
○ ISO 8601 + timezone
○ Millis si...
www.scling.com
What causes cognitive waste?
● We are autonomous!
○ Teams can choose technology, format, process, ...
● Cog...
www.scling.com
Avoiding cognitive waste
● Reusing semantic definitions
● Reusing code & technical definitions
○ Code trans...
www.scling.com
Eliminating cognitive waste
● Refactoring code, semantics, docs
● Low risk - what will I break downstream?
...
www.scling.com
Delivery waste
● Friction from code to production
○ Ideal: Idea, research, write code+tests, done. Everythi...
www.scling.com
Data product quality assurance
● Product quality = f(code, data)
○ Cannot do full QA on code only
○ Only re...
www.scling.com
Eliminating delivery friction
14
● In theory simple - scrutinise everything
○ Positive engineering: writing...
www.scling.com
So get rid of the waste. Resources:
No tradeoff between speed and quality!
15
www.scling.com
● Code not yet fully utilised
● Code on its way to production
○ In a notebook
○ Waiting for approval
○ Wait...
www.scling.com
Data inventory
● Data collected, but not yet fully processed
○ Traditional lazy joins & SQL processing at r...
www.scling.com
Operational waste
● Friction in operational manoeuvres
○ Fear of mistakes
● Cost of incidents
○ Time to rec...
www.scling.com
Separating offline and online
19
Raw
19
Fraud
serviceFraud
model
Orders Orders
Replication /
Backup
Standar...
www.scling.com
20
Cost of a software error
Online
● User impact
● Data corruption
● Cascading corruption
● Unbounded recov...
www.scling.com
21
Cost of a software error
Nearline
● Data corruption
● Downstream impact
● Bounded recovery
Online
● User...
www.scling.com
22
Cost of a software error
Nearline
● Data corruption
● Downstream impact
● Bounded recovery
Offline
● Tem...
www.scling.com
Data speed Innovation speed
23
Nearline
Data processing tradeoff
23
Job
Stream
OfflineOnline
Stream
Job
Str...
www.scling.com
Product waste
● Work not driven by use case
● Unrealised data potential due to friction
○ Unawareness of da...
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
Waste: Your Team’s Time Not Well Spent
25
Percentage
Time Team
Sp...
Copyright 2020 DataKitchen, Inc.
Waste: Data Analytics is like the US Auto
Industry in the 1970s
Current
High Errors
Produ...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
Waste: Conway’s Law and Data Pipelines
Data Analytics Follows Co...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
Waste: A cornucopia of collaboration complexity
D D
P
D
D
D D
D
...
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
Why? Data Teams Are Suffering
Data teams are caught between three...
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
DataOps – Solution To That Suffering
DataOps – The technical prac...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DataOps Benefit: Lower Cost, More Insight
31
After DataOps
Percen...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DataOps Benefit: Faster, Better & Happier
32
After DataOpsBefore ...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Originat...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Originat...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Originat...
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Originat...
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
What You Do Is Much Less Important Than
How You Do It
37
“We real...
www.scling.com
Questions?
38
Upcoming SlideShare
Loading in …5
×

The lean principles of data ops

Modern data processing environments resemble factory lines, transforming raw data to valuable data products. The lean principles that have successfully transformed manufacturing are equally applicable to data processing, and are well aligned with the new trend known as DataOps. In this presentation, we will explain how applying lean and DataOps principles can be implemented as technical data processing solutions and processes in order to eliminate waste and improve data innovation speed. We will go through how to eliminate the following types of waste in data processing systems:

* Cognitive waste - unclear source of truth, dependency sprawl, duplication, ambiguity.
* Operational waste - overhead for deployment, upgrades, and incident recovery.
* Delivery waste - friction and delay in development, testing, and deployment.
* Product waste - misalignment to business value, detach from use cases, push driven development, vanity quality assurance.

We will primarily focus on technical solutions, but some of the waste mentioned requires organisational refactoring to eliminate.

  • Be the first to comment

  • Be the first to like this

The lean principles of data ops

  1. 1. www.scling.com The lean principles of DataOps Berlin Buzzwords, 2020-06-08 Lars Albertsson, Founder, Scling Christopher Bergh, CEO & Head Chef, DataKitchen 1
  2. 2. www.scling.com Scling - data-value-as-a-service 2 Data lake Stream storage ● Extract value from your data ● Data platform + custom data pipelines ● Imitate data leaders: ○ Quick idea-to-production ○ Operational efficiency Our marketing strategy: ● Promiscuously share knowledge ○ On slides devoid of glossy polish
  3. 3. www.scling.com 1994: OS/2 Warp CID installation 3 Grmbl, who reinstalled my machine?
  4. 4. www.scling.com IT craft to factory 4 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  5. 5. www.scling.com Security Waterfall Data factories 5 Application delivery Traditional operations Traditional QA Infrastructure DB-oriented architecture DevSecOps Agile Containers DevOps CI/CD Infrastructure as code Data factories, data pipelines, DataOps
  6. 6. www.scling.com The Toyota Way Selected lean principles: ● Long-term over short-term ● The right process will produce the right results ● Eliminate waste (muda) ● Continuous improvement (kaizen) ● Use pull systems to avoid unnecessary production ● Quality takes precedence (jidoka) ○ Stop to fix problems ● Standardised tasks and processes ● Reliable technology that serves people and process ● Develop your people ● Decisions slowly by consensus ● Relentless reflection (hansei), organisational learning 6
  7. 7. www.scling.com Common waste species ● Cognitive waste ● Delivery waste ● Operational waste ● Product waste 7
  8. 8. www.scling.com Cognitive waste ● Why do we have 25 time formats? ○ ISO 8601, UTC assumed ○ ISO 8601 + timezone ○ Millis since epoch, UTC ○ Nanos since epoch, UTC ○ Millis since epoch, user local time ○ … ○ Float of seconds since epoch, as string. WTF?!? ● my-kafka-topic-name, your_topic_name 8 ● Definition of an order: ○ Abandoned cart? ○ Payment refused? ○ Returned goods? ○ Free promotion? ● Data entity source of truth ○ MySQL, Kafka, data lake?
  9. 9. www.scling.com What causes cognitive waste? ● We are autonomous! ○ Teams can choose technology, format, process, ... ● Cognitive debt ○ Short-term over long-term ○ Decisions without consensus ● Recognition and rewards ○ "You have made a similar independent pipeline, great work!" 9
  10. 10. www.scling.com Avoiding cognitive waste ● Reusing semantic definitions ● Reusing code & technical definitions ○ Code transparency & sharing ○ Standardised technology ○ Document decisions & consensus process ● Read-only sharing not enough ○ Must be empowered to change for reuse and to improve quality ○ Standardised processes 10
  11. 11. www.scling.com Eliminating cognitive waste ● Refactoring code, semantics, docs ● Low risk - what will I break downstream? ○ Standardised, automated, trusted QA process ○ End-to-end pipeline testing ● "Creating a pipeline - one day! Replace old pipeline - 18 months." 11
  12. 12. www.scling.com Delivery waste ● Friction from code to production ○ Ideal: Idea, research, write code+tests, done. Everything else is friction. ● Code inventory ○ Code not yet fully utilised ● Data inventory ○ Data not yet fully processed 12
  13. 13. www.scling.com Data product quality assurance ● Product quality = f(code, data) ○ Cannot do full QA on code only ○ Only real data is production data ● Test in production ○ Quick QA cycle = quick production deployment ○ Measure, monitor, validate 13
  14. 14. www.scling.com Eliminating delivery friction 14 ● In theory simple - scrutinise everything ○ Positive engineering: writing code, tests, docs, refactor, improve ○ All else is negative ● You are limited by your assumptions ○ State of practice far from state of art But the test suite takes 3 hours. We have this checklist. Security must approve. X must be released before Y. That is another team's job. We don't have access. We must test in staging first. We haven't performance tested yet.
  15. 15. www.scling.com So get rid of the waste. Resources: No tradeoff between speed and quality! 15
  16. 16. www.scling.com ● Code not yet fully utilised ● Code on its way to production ○ In a notebook ○ Waiting for approval ○ Waiting for release ○ Internally released, waiting for dependants to upgrade ● Tests not fully used ○ Cover code (shared component), but not yet executed Code inventory 16
  17. 17. www.scling.com Data inventory ● Data collected, but not yet fully processed ○ Traditional lazy joins & SQL processing at runtime ● Eliminate with eager processing = pipeline ○ Process, join, denormalise ● Fatal problems → offline crash ○ "Andon" cord - stop and fix before significant harm is done 17
  18. 18. www.scling.com Operational waste ● Friction in operational manoeuvres ○ Fear of mistakes ● Cost of incidents ○ Time to recovery ○ Impact of incident ○ Frequency of incidents 18
  19. 19. www.scling.com Separating offline and online 19 Raw 19 Fraud serviceFraud model Orders Orders Replication / Backup Standard procedures Standard proceduresLightweight procedures ● QA driven by internal efficiency ● Continuous deployment ● New pipeline < 1 day ● Upgrade < 1 hour ● Bug recovery < 1 hour Careful handover Careful handover
  20. 20. www.scling.com 20 Cost of a software error Online ● User impact ● Data corruption ● Cascading corruption ● Unbounded recovery
  21. 21. www.scling.com 21 Cost of a software error Nearline ● Data corruption ● Downstream impact ● Bounded recovery Online ● User impact ● Data corruption ● Cascading corruption ● Unbounded recovery Job Stream Stream Job Stream
  22. 22. www.scling.com 22 Cost of a software error Nearline ● Data corruption ● Downstream impact ● Bounded recovery Offline ● Temporary data corruption ● Downstream impact ● Easy recovery Online ● User impact ● Data corruption ● Cascading corruption ● Unbounded recovery Job Stream Stream Job Stream
  23. 23. www.scling.com Data speed Innovation speed 23 Nearline Data processing tradeoff 23 Job Stream OfflineOnline Stream Job Stream
  24. 24. www.scling.com Product waste ● Work not driven by use case ● Unrealised data potential due to friction ○ Unawareness of data ○ Difficulty to use data ● Hidden quality problems ● Collaboration and communication overhead 24 Data democratisation - making data accessible and usable
  25. 25. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. Waste: Your Team’s Time Not Well Spent 25 Percentage Time Team Spends Per Week Current Errors & Operational Tasks New Features & Data For Customers Improvements & Debt Challenges: • Complex roles • Complex organizations • Complex toolchains • Complex data • Complex collaboration
  26. 26. Copyright 2020 DataKitchen, Inc. Waste: Data Analytics is like the US Auto Industry in the 1970s Current High Errors Production Errors Data Analytics Team Deployment Latency Weeks, Months Dev Prod Challenges: • Slow to add new features, rapidly address consumer requests, changing data sets • Lack of trust by data consumers • Slow model deployment, slow to move to cloud • Team morale 26
  27. 27. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. Waste: Conway’s Law and Data Pipelines Data Analytics Follows Conway's Law The structure of how teams are organized to do Data Science, Data Engineering, Analytics, and Production is reflected in their data pipelines.
  28. 28. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. Waste: A cornucopia of collaboration complexity D D P D D D D D D D P D P P D Development - Data Analytic Team P Production - Data Analytic Team Centralized Dev Centralized Dev & Prod Decentralized Dev Decentralized Dev & Prod How do we create together without conflicts? (Data Engineer & Data Scientist) How do we deploy safely and rapidly? (Data Team and Production Team) How to balance centralized control vs self service freedom? (Home Office Data Team and Line of Business Analysts) How to reuse/incorporate what another team deployed? (Multiple Data & Production Teams in Many Orgs) DE DS BI
  29. 29. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. Why? Data Teams Are Suffering Data teams are caught between three competing forces: • Unaware Data Providers – unaware that they send crappy, late, and error prone data sets • Demanding Data Consumers – demand trusted, original insight at the speed of Amazon delivery • Critical Supporting Teams – need flawless ongoing production and collaboration with other teams/people Make for: • A beaten down, distraught, disempowered work environment • Teams that cannot create and innovate • Lack of trust all around 29 Unaware Data Providers Demanding Data Consumers Critical Supporting Teams
  30. 30. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. DataOps – Solution To That Suffering DataOps – The technical practices, cultural norms, and architecture that enable: • Rapid cycles of experimentation and innovation to delivery of new insights to our customers • Low error rates • Collaboration across complex sets of people, technology, and environments • Clear measurement and monitoring of results 30Source: Gartner “Organizations that adopt a DevOps- and DataOps-based approach are more successful in implementing end-to-end, reliable, robust, scalable and repeatable solutions.” Sumit Pal, Gartner, November 2018 People, Process, Organization Technical Environment
  31. 31. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DataOps Benefit: Lower Cost, More Insight 31 After DataOps Percentage Time Team Spends Per Week Before DataOps New Features & Data For Customers Errors & Operational Tasks New Features & Data For Customers Improvements & Debt Errors & Operational Tasks Process Improvements & Tech Debt Reduction
  32. 32. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DataOps Benefit: Faster, Better & Happier 32 After DataOpsBefore DataOps High Errors Production Errors Low Errors Data Analytics Team Deployment Latency Weeks, Months Dev Prod Hours & Mins Dev Prod
  33. 33. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  34. 34. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  35. 35. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  36. 36. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  37. 37. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. What You Do Is Much Less Important Than How You Do It 37 “We realized that the true problem, the true difficulty, and where the greatest potential is – is building the machine that makes the machine. It’s building the factory.” – Elon Musk 94% of causes were common cause. We often attribute problems to a specific case, and look for a person to blame, rather than focusing on the underlying process – Dr Deming
  38. 38. www.scling.com Questions? 38

×