SlideShare a Scribd company logo
1 of 29
Download to read offline
www.scling.com
DataOps - Lean principles
and practices
Data 2030 Summit, 2021-02-11
Lars Albertsson, Founder, Scling
1
www.scling.com
Ask not what, but how
2
Ideas << execution
DataOps is the "how" of data & ML
2013: Transform @ Spotify
2014: "DataOps" term first seen
2018: Conference talk rejected
2019: Most watched recording @ Data Innovation Summit
2021: DataOps day @ Data 2030 Summit
www.scling.com
Enabling innovation
3
"The actual work that went into
Discover Weekly was very little,
because we're reusing things we
already had."
https://youtu.be/A259Yo8hBRs
https://youtu.be/ZcmJxli8WS8
https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/
"Discover Weekly wasn't a great
strategic plan and 100 engineers.
It was 3 engineers that decided to
build something."
"I would have killed it. All of a sudden,
they shipped it. It’s one of the most
loved product features that we have."
- Daniel Ek, CEO
www.scling.com
IT craft to factory
4
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
www.scling.com
Security Waterfall
Data factories
5
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DB-oriented
architecture
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
From craft to process
6
www.scling.com
From craft to process
7
Multiple time windows
Assess ingress data quality
Repair broken data from
complementary source
Forecast based on history,
multiple parameter settings
Assess outcome data quality
Assess forecast success,
adapt parameters
www.scling.com
Naive ML
8
www.scling.com
Towards sustainable production ML
9
Multiple models,
parameters, features
Assess ingress data quality
Repair broken data from
complementary source
Choose model and parameters based
on performance and input data
Benchmark models
Try multiple models,
measure, A/B test
www.scling.com
The Toyota Way
Selected lean principles:
● Long-term over short-term
● The right process will produce the right results
● Eliminate waste (muda)
● Continuous improvement (kaizen)
● Use pull systems to avoid unnecessary production
● Quality takes precedence (jidoka)
○ Stop to fix problems
● Standardised tasks and processes
● Reliable technology that serves people and process
● Develop your people
● Decisions slowly by consensus
● Relentless reflection (hansei), organisational learning
10
www.scling.com
Common waste species
● Cognitive waste
● Technology waste
● Delivery waste
● Operational waste
● Product waste
11
Companies are generally good
at handling some waste forms,
and blind to others.
Your blindness is your potential.
www.scling.com
Cognitive waste
● Why do we have 25 time formats?
○ ISO 8601, UTC assumed
○ ISO 8601 + timezone
○ Millis since epoch, UTC
○ Nanos since epoch, UTC
○ Millis since epoch, user local time
○ …
○ Float of seconds since epoch, as string.
WTF?!?
● my-kafka-topic-name, your_topic_name
12
● Definition of an order:
○ Abandoned cart?
○ Payment refused?
○ Returned goods?
○ Free promotion?
● Data entity source of truth
○ MySQL, Kafka, data lake?
● Code and documentation sprawl
○ Repositories & branches
○ Wikis
www.scling.com
What causes cognitive waste?
● We are autonomous!
○ Teams can choose technology, format, process, ...
● Cognitive debt
○ Short-term over long-term
○ Decisions without consensus
● Recognition and rewards
○ "You have made a similar independent pipeline, great work!"
13
www.scling.com
Avoiding cognitive waste
● Reusing semantic definitions
● Reusing code & technical definitions
○ Code transparency & sharing
○ Standardised technology
○ Document decisions & consensus process
● Read-only sharing not enough
○ Must be empowered to
■ change for reuse
■ improve quality
■ delete unused
○ Low risk - what will I break downstream?
○ Standardised, end-to-end QA processes
14
www.scling.com
● Code not yet fully utilised
● Code on its way to production
○ In a notebook
○ Waiting for approval
○ Waiting for release
○ Internally released, waiting
for dependants to upgrade
● Tests not fully used
○ Tests that cover code (shared component),
but are not yet executed
Delivery waste - code inventory
15
www.scling.com
Eliminating delivery waste
16
● Friction from code to production
○ Positive engineering: research, writing code, tests, docs, refactor, improve
○ All else is negative
● You are limited by your assumptions
○ State of practice far is from state of art
But the test suite
takes 3 hours.
We have this
checklist.
Security must
approve.
X must be
released before Y.
That is another
team's job.
We don't have
access.
We must test in
staging first.
We haven't
performance
tested yet.
www.scling.com
So get rid of the waste. Resources:
No tradeoff between speed and quality!
17
www.scling.com
Data inventory
● Data collected, but not yet fully processed
○ Traditional lazy joins & SQL processing at runtime
○ Extract-load-transform (ELT)
● Eliminate with eager processing = pipeline
○ Process, join, denormalise
○ Extract-transform-load (ETL)
● Fatal problems → offline crash
○ "Andon" cord - stop and fix before significant harm is done
18
www.scling.com
Technology waste
19
NoSQL
Stream
processing
Spark/Flink
Hadoop
In-memory
databases
Schema
registry
Data
catalogue
Feature store
Change data
capture
Data
versioning
Governance
system Data
warehouse
Lakehouse
Scaled out
compute
Kubernetes
Essential
Compute
machines
Workflow
orchestration
RDBMS
File
storage
Code version
control
Visualisation Graph
processing
Deep learning
www.scling.com
Operational waste
● Friction in operational manoeuvres
○ Fear of mistakes
○ Application-specific tooling
● Cost of incidents
○ Time to recovery
○ Impact of incident
○ Frequency of incidents
20
www.scling.com
Separating offline and online
21
Raw
Fraud
service
Fraud
model
Orders Orders
Replication /
Backup
Prudent procedures Prudent procedures
Lightweight procedures
● QA driven by internal efficiency
● Continuous deployment
● New pipeline < 1 day
● Upgrade < 1 hour
● Bug recovery < 1 hour
Careful handover Careful handover
www.scling.com
Many nines uptime (99.99.. %) A couple of sevens
Data speed Innovation speed
22
Nearline
Data processing tradeoff
Job
Stream
Offline
Online
Stream
Job
Stream
www.scling.com
Product waste
● Work not driven by use case
● Unrealised data potential due to friction
○ Unawareness of data
○ Difficulty to use data
● Collaboration and communication
○ Connection
○ Overhead
23
Data democratisation -
making data accessible
and usable
Form teams aligned to
value flows.
www.scling.com
Continuous improvement & learning
● Products, not projects
○ Owned, never done, always improving
● To production early
○ Minimal fear
○ Measure and monitor to learn
● Fail & iterate
○ No blame, no penalties
● Communication across organisation essential
○ Data source team - data processing team - stakeholders
24
www.scling.com
Data product quality assurance
● Product quality = f(code, data)
○ Cannot do full QA on code only
○ Only real data is production data
● Test in production
○ Quick QA cycle = quick production deployment
○ Measure, monitor, validate
25
www.scling.com
Infrastructure waste
26
● Production environment only
○ Dev, test, staging lack production data
● Dark pipelines
○ Run in parallel
○ Monitor diff vs production
○ Roll out slowly?
∆?
www.scling.com
Slow cycle - slow learning
27
www.scling.com
Learning more about Lean & DataOps
28
www.scling.com
Scling - data-value-as-a-service
29
Data value through collaboration
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Rapid data
innovation
Learning by doing,
in collaboration

More Related Content

What's hot

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

What's hot (20)

What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
What are data products and why are they different from other products?
What are data products and why are they different from other products?What are data products and why are they different from other products?
What are data products and why are they different from other products?
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
[Cloud OnAir] BigQuery で実現する Smart Analytics Platform 2019年10月24日 放送
[Cloud OnAir] BigQuery で実現する Smart Analytics Platform 2019年10月24日 放送[Cloud OnAir] BigQuery で実現する Smart Analytics Platform 2019年10月24日 放送
[Cloud OnAir] BigQuery で実現する Smart Analytics Platform 2019年10月24日 放送
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
DMBOKをベースにしたデータマネジメント
DMBOKをベースにしたデータマネジメントDMBOKをベースにしたデータマネジメント
DMBOKをベースにしたデータマネジメント
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
 
Part 0: 製造リファレンス・アーキテクチャとは?(製造リファレンス・アーキテクチャ勉強会)
Part 0: 製造リファレンス・アーキテクチャとは?(製造リファレンス・アーキテクチャ勉強会)Part 0: 製造リファレンス・アーキテクチャとは?(製造リファレンス・アーキテクチャ勉強会)
Part 0: 製造リファレンス・アーキテクチャとは?(製造リファレンス・アーキテクチャ勉強会)
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
ETL
ETLETL
ETL
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
 
Dropbox Talk at Netflix ML Platform Meetup Spe 2019
Dropbox Talk at Netflix ML Platform Meetup Spe 2019Dropbox Talk at Netflix ML Platform Meetup Spe 2019
Dropbox Talk at Netflix ML Platform Meetup Spe 2019
 

Similar to DataOps - Lean principles and lean practices

Aws uk ug #8 not everything that happens in vegas stay in vegas
Aws uk ug #8   not everything that happens in vegas stay in vegasAws uk ug #8   not everything that happens in vegas stay in vegas
Aws uk ug #8 not everything that happens in vegas stay in vegas
Peter Mounce
 

Similar to DataOps - Lean principles and lean practices (20)

The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsDOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
 
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
 
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
Aws uk ug #8 not everything that happens in vegas stay in vegas
Aws uk ug #8   not everything that happens in vegas stay in vegasAws uk ug #8   not everything that happens in vegas stay in vegas
Aws uk ug #8 not everything that happens in vegas stay in vegas
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 

More from Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
Lars Albertsson
 

More from Lars Albertsson (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
Data democratised
Data democratisedData democratised
Data democratised
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

DataOps - Lean principles and lean practices

  • 1. www.scling.com DataOps - Lean principles and practices Data 2030 Summit, 2021-02-11 Lars Albertsson, Founder, Scling 1
  • 2. www.scling.com Ask not what, but how 2 Ideas << execution DataOps is the "how" of data & ML 2013: Transform @ Spotify 2014: "DataOps" term first seen 2018: Conference talk rejected 2019: Most watched recording @ Data Innovation Summit 2021: DataOps day @ Data 2030 Summit
  • 3. www.scling.com Enabling innovation 3 "The actual work that went into Discover Weekly was very little, because we're reusing things we already had." https://youtu.be/A259Yo8hBRs https://youtu.be/ZcmJxli8WS8 https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/ "Discover Weekly wasn't a great strategic plan and 100 engineers. It was 3 engineers that decided to build something." "I would have killed it. All of a sudden, they shipped it. It’s one of the most loved product features that we have." - Daniel Ek, CEO
  • 4. www.scling.com IT craft to factory 4 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  • 7. www.scling.com From craft to process 7 Multiple time windows Assess ingress data quality Repair broken data from complementary source Forecast based on history, multiple parameter settings Assess outcome data quality Assess forecast success, adapt parameters
  • 9. www.scling.com Towards sustainable production ML 9 Multiple models, parameters, features Assess ingress data quality Repair broken data from complementary source Choose model and parameters based on performance and input data Benchmark models Try multiple models, measure, A/B test
  • 10. www.scling.com The Toyota Way Selected lean principles: ● Long-term over short-term ● The right process will produce the right results ● Eliminate waste (muda) ● Continuous improvement (kaizen) ● Use pull systems to avoid unnecessary production ● Quality takes precedence (jidoka) ○ Stop to fix problems ● Standardised tasks and processes ● Reliable technology that serves people and process ● Develop your people ● Decisions slowly by consensus ● Relentless reflection (hansei), organisational learning 10
  • 11. www.scling.com Common waste species ● Cognitive waste ● Technology waste ● Delivery waste ● Operational waste ● Product waste 11 Companies are generally good at handling some waste forms, and blind to others. Your blindness is your potential.
  • 12. www.scling.com Cognitive waste ● Why do we have 25 time formats? ○ ISO 8601, UTC assumed ○ ISO 8601 + timezone ○ Millis since epoch, UTC ○ Nanos since epoch, UTC ○ Millis since epoch, user local time ○ … ○ Float of seconds since epoch, as string. WTF?!? ● my-kafka-topic-name, your_topic_name 12 ● Definition of an order: ○ Abandoned cart? ○ Payment refused? ○ Returned goods? ○ Free promotion? ● Data entity source of truth ○ MySQL, Kafka, data lake? ● Code and documentation sprawl ○ Repositories & branches ○ Wikis
  • 13. www.scling.com What causes cognitive waste? ● We are autonomous! ○ Teams can choose technology, format, process, ... ● Cognitive debt ○ Short-term over long-term ○ Decisions without consensus ● Recognition and rewards ○ "You have made a similar independent pipeline, great work!" 13
  • 14. www.scling.com Avoiding cognitive waste ● Reusing semantic definitions ● Reusing code & technical definitions ○ Code transparency & sharing ○ Standardised technology ○ Document decisions & consensus process ● Read-only sharing not enough ○ Must be empowered to ■ change for reuse ■ improve quality ■ delete unused ○ Low risk - what will I break downstream? ○ Standardised, end-to-end QA processes 14
  • 15. www.scling.com ● Code not yet fully utilised ● Code on its way to production ○ In a notebook ○ Waiting for approval ○ Waiting for release ○ Internally released, waiting for dependants to upgrade ● Tests not fully used ○ Tests that cover code (shared component), but are not yet executed Delivery waste - code inventory 15
  • 16. www.scling.com Eliminating delivery waste 16 ● Friction from code to production ○ Positive engineering: research, writing code, tests, docs, refactor, improve ○ All else is negative ● You are limited by your assumptions ○ State of practice far is from state of art But the test suite takes 3 hours. We have this checklist. Security must approve. X must be released before Y. That is another team's job. We don't have access. We must test in staging first. We haven't performance tested yet.
  • 17. www.scling.com So get rid of the waste. Resources: No tradeoff between speed and quality! 17
  • 18. www.scling.com Data inventory ● Data collected, but not yet fully processed ○ Traditional lazy joins & SQL processing at runtime ○ Extract-load-transform (ELT) ● Eliminate with eager processing = pipeline ○ Process, join, denormalise ○ Extract-transform-load (ETL) ● Fatal problems → offline crash ○ "Andon" cord - stop and fix before significant harm is done 18
  • 19. www.scling.com Technology waste 19 NoSQL Stream processing Spark/Flink Hadoop In-memory databases Schema registry Data catalogue Feature store Change data capture Data versioning Governance system Data warehouse Lakehouse Scaled out compute Kubernetes Essential Compute machines Workflow orchestration RDBMS File storage Code version control Visualisation Graph processing Deep learning
  • 20. www.scling.com Operational waste ● Friction in operational manoeuvres ○ Fear of mistakes ○ Application-specific tooling ● Cost of incidents ○ Time to recovery ○ Impact of incident ○ Frequency of incidents 20
  • 21. www.scling.com Separating offline and online 21 Raw Fraud service Fraud model Orders Orders Replication / Backup Prudent procedures Prudent procedures Lightweight procedures ● QA driven by internal efficiency ● Continuous deployment ● New pipeline < 1 day ● Upgrade < 1 hour ● Bug recovery < 1 hour Careful handover Careful handover
  • 22. www.scling.com Many nines uptime (99.99.. %) A couple of sevens Data speed Innovation speed 22 Nearline Data processing tradeoff Job Stream Offline Online Stream Job Stream
  • 23. www.scling.com Product waste ● Work not driven by use case ● Unrealised data potential due to friction ○ Unawareness of data ○ Difficulty to use data ● Collaboration and communication ○ Connection ○ Overhead 23 Data democratisation - making data accessible and usable Form teams aligned to value flows.
  • 24. www.scling.com Continuous improvement & learning ● Products, not projects ○ Owned, never done, always improving ● To production early ○ Minimal fear ○ Measure and monitor to learn ● Fail & iterate ○ No blame, no penalties ● Communication across organisation essential ○ Data source team - data processing team - stakeholders 24
  • 25. www.scling.com Data product quality assurance ● Product quality = f(code, data) ○ Cannot do full QA on code only ○ Only real data is production data ● Test in production ○ Quick QA cycle = quick production deployment ○ Measure, monitor, validate 25
  • 26. www.scling.com Infrastructure waste 26 ● Production environment only ○ Dev, test, staging lack production data ● Dark pipelines ○ Run in parallel ○ Monitor diff vs production ○ Roll out slowly? ∆?
  • 27. www.scling.com Slow cycle - slow learning 27
  • 29. www.scling.com Scling - data-value-as-a-service 29 Data value through collaboration Customer Data factory Data platform & lake data domain expertise Value from data! Rapid data innovation Learning by doing, in collaboration