SlideShare a Scribd company logo
www.scling.com
Crossing the data divide
Lars Albertsson, Founder, Scling
Data Innovation Summit, 2021-10-14
1
www.scling.com
The great capability divide
2
1000x span in
availability metrics
Started 2002 / 2006,
launched 2010,
killed 2012
1000 person years,
cost $125M
Started 2009-05-10,
launched 2009-05-16
$80M revenue in 15 months
https://www.flickr.com/photos/downloadsourcefr/15944373702, CC BY 2.0
Pirate Bay founders' picture used without permission
www.scling.com
Efficiency gap, data cost & value
● Data processing produces datasets
○ Each dataset has business value
● Proxy value/cost metric: datasets / day
○ S-M traditional: < 10
○ Bank, telecom, media: 10-1000
3
2014: 6500 datasets / day
2016: 20000 datasets / day
2017: 100B events collected / day
2018: 100000+ datasets / day,
25% of staff use BigQuery
2016: 1600 000 000
datasets / day
Disruptive value of data, machine learning
Financial, reporting
Insights, data-fed features
effort
value
www.scling.com
● Scaled processes
● Machine tools
● Challenges: scale,
logistics, legal,
organisation, faults, ...
Manual, mechanised, industrialised
4
● Muscle-powered
● Few tools
● Human touch for every
step
● Direct human control
● Machine tools
● Low investment, direct
return
www.scling.com
● Hand-built models
● Manual deployment
● Spreadsheets
Data artifacts: 100x 1000x
● Automated QA,
monitoring
● Continuous deployment
● Hadoop ecosystem
Manual, mechanised, industrialised
5
● Automated training
● Semi-automated
deployment
● Data warehouses,
notebooks
www.scling.com
Road towards industrialisation
6
Data warehouse age -
mechanised analytics
DW
LAMP stack age -
manual analytics
Hadoop age -
industrialised analytics,
data-fed features,
machine learning
Significant change in workflows
Early Hadoop:
● Weak indexing
● No transactions
● Weak security
● Batch transformations
www.scling.com
Road back again
7
DW
Enterprise big data failures
Post-Hadoop "data engineering" -
traditional workflows, new technology
www.scling.com
Gap is still there
8
DW
Enterprise big data failures
Post-Hadoop "data engineering" -
traditional workflows, new technology
~10 year capability gap
"data factory engineering"
Current data eng focus -
narrative, tools, vendors
www.scling.com
What conclusion from this graph?
COVID-19 fatalities / day in Sweden
9
www.scling.com
What conclusion from this graph?
COVID-19 fatalities / day in Sweden
10
Fatalities collected during 2 day
Fatalities collected during 4 days
Fatalities collected during 10 days
www.scling.com
Normalise data collection to compare
11
Graph by Adam Altmejd, @adamaltmejd
www.scling.com
Forecast for analytics with fresh data
12
Graph by Adam Altmejd, @adamaltmejd
www.scling.com
From craft to process
13
www.scling.com
From craft to process
14
Multiple time windows
Assess ingress data quality
Repair broken data from
complementary source
Forecast based on history,
multiple parameter settings
Assess outcome data quality
Assess forecast success,
adapt parameters
www.scling.com
Naive ML
15
www.scling.com
Sustainable production ML
16
Multiple models,
parameters, features
Assess ingress data quality
Repair broken data from
complementary source
Choose model and parameters based
on performance and input data
Benchmark models
Try multiple models,
measure, A/B test
www.scling.com
Data engineering vs data factory engineering
17
How to organise
How to work How to build
www.scling.com
Data factory engineering principles - technology
18
Centralised,
homogeneous
data platform
Functional
architecture
Simple technology,
simple rituals
● Minimal experiment friction
○ Centralise first to establish homogeneity
● Democratised functional data processing
○ Raw data + transforms
○ Immutable datasets!
www.scling.com
Data-centric innovation
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ quality?
○ extraction?
○ data governance?
○ history?
19
www.scling.com
Data platform
Big data - a collaboration paradigm
20
Stream storage
Data lake
Data
democratised
www.scling.com
Data factory engineering principles - architecture
21
Failure-driven
design
What happens,
happens in production
Fast feedback cycle,
slow integration
● Batch processing is self healing
○ If you master workflow orchestration
● Low failure impact → high risk → fast cycle
www.scling.com 22
Cost of a software error
Nearline
● Data corruption
● Downstream impact
● Bounded recovery
Offline
● Temporary data
corruption
● Downstream impact
● Easy recovery
Online
● User impact
● Data corruption
● Cascading corruption
● Unbounded recovery
Job
Stream
Stream
Job
Stream
www.scling.com
Many nines uptime (99.99.. %) A couple of sevens
Data speed Innovation speed
23
Nearline
Data processing tradeoff
Job
Stream
Offline
Online
Stream
Job
Stream
www.scling.com
Eliminate infrastructure waste
24
● Production environment only
○ Dev, test, staging lack production data
● Dark pipelines
○ Run in parallel
○ Monitor diff vs production
○ Roll out slowly?
∆?
www.scling.com
Data factory engineering principles - engineering
25
It's a software
engineering problem
Continuous
process
improvement
● Quality, reproducibility, versioning,
deployment, monitoring, rapid change?
○ Solved software engineering
problems!
● Capable, unpolished components
○ Designed for strong processes,
CI/CD, testing, observability
○ Ugly interfaces
● Statistical process control, engineered
www.scling.com
SQL is a power tool, not an industrial robot
26
● No composition & abstractions
○ Hostile to testing
● Not expressive enough for mature data processing
● Hostile to data quality measurements and repair
○ Hadoop/Spark/Flink have quality primitives built in
https://threadreaderapp.com/thread/1353832649664692225.html
www.scling.com
Data factory engineering principles - value iteration
27
Pull-driven work,
initiated by business
value needs
Products, not
projects
Align along
value flows
● Only business value counts
○ Drives work
○ Few teams along path
● Data is organic
○ Never done, always iterate
www.scling.com
Data factory engineering principles
28
Centralised,
homogeneous
data platform
Functional
architecture
How to organise
It's a software
engineering problem
Pull-driven work,
initiated by business
value needs
Failure-driven
design
Simple technology,
simple rituals
What happens,
happens in production
Fast feedback cycle,
slow integration
Continuous
process
improvement
Products, not
projects
How to work How to build
Align along
value flows
www.scling.com
Software factory engineering principles
29
Immutable images
Agile
Statistical process control
Products
DevOps
Puppet, Ansible
Waterfall
In prod debugging
Projects
Dev + Ops
High code
Low code
www.scling.com
What should a company do?
30
● Everything in-house
○ Works only for big tech
● Vendors - build, not buy
○ Works for families of use cases
○ So far a 10 year gap to tech elite
● Get consultants
○ No competence flow from European big tech to consultants
○ Products, not projects
● Long-term partnerships?
○ Common outside IT
○ Unfamiliar model in IT - cf. cloud resistance
Autoliv general presentation 2017
www.scling.com
Scling - data-value-as-a-service
31
Data value through collaboration
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Rapid data
innovation
Learning by doing,
in collaboration

More Related Content

Similar to Crossing the data divide

DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
Lars Albertsson
 
OpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case StudiesOpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case Studies
Datavail
 
An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022
An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022
An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022
HostedbyConfluent
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
Lars Albertsson
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
confluent
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
DataKitchen
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Reducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyReducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case Study
Venkata Pingali
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
Altinity Ltd
 
MLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven FactoryMLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven Factory
BigML, Inc
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
BigDataExpo
 
DN 2017 | Hardware Failure Prediction at Dell-EMC | Ran Taig | Dell
DN 2017 |  Hardware Failure Prediction at Dell-EMC | Ran Taig | DellDN 2017 |  Hardware Failure Prediction at Dell-EMC | Ran Taig | Dell
DN 2017 | Hardware Failure Prediction at Dell-EMC | Ran Taig | Dell
Dataconomy Media
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
Talend
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
Lars Albertsson
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning
Mikhail Rozhkov
 
Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?
DiUS
 
SOP Planning and Optimization Solution-as-a-Service.pdf
SOP Planning and Optimization Solution-as-a-Service.pdfSOP Planning and Optimization Solution-as-a-Service.pdf
SOP Planning and Optimization Solution-as-a-Service.pdf
David Barbieri Kennedy
 
About The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsAbout The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe Analytics
Kevin Haag
 

Similar to Crossing the data divide (20)

DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
OpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case StudiesOpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case Studies
 
An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022
An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022
An Analytics Engineer’s Guide to Streaming With Amy Chen | Current 2022
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Reducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case StudyReducing Cost of Production ML: Feature Engineering Case Study
Reducing Cost of Production ML: Feature Engineering Case Study
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
MLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven FactoryMLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven Factory
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
 
DN 2017 | Hardware Failure Prediction at Dell-EMC | Ran Taig | Dell
DN 2017 |  Hardware Failure Prediction at Dell-EMC | Ran Taig | DellDN 2017 |  Hardware Failure Prediction at Dell-EMC | Ran Taig | Dell
DN 2017 | Hardware Failure Prediction at Dell-EMC | Ran Taig | Dell
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning
 
Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?Lunch and Learn: You have the data, now what?
Lunch and Learn: You have the data, now what?
 
SOP Planning and Optimization Solution-as-a-Service.pdf
SOP Planning and Optimization Solution-as-a-Service.pdfSOP Planning and Optimization Solution-as-a-Service.pdf
SOP Planning and Optimization Solution-as-a-Service.pdf
 
About The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsAbout The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe Analytics
 

More from Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
Lars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
Lars Albertsson
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
Lars Albertsson
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
Lars Albertsson
 
Data democratised
Data democratisedData democratised
Data democratised
Lars Albertsson
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
Lars Albertsson
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
Lars Albertsson
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
Lars Albertsson
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
Lars Albertsson
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
Lars Albertsson
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
Lars Albertsson
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
Lars Albertsson
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
Lars Albertsson
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
Lars Albertsson
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
Lars Albertsson
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
Lars Albertsson
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
Lars Albertsson
 

More from Lars Albertsson (18)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Data democratised
Data democratisedData democratised
Data democratised
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 

Recently uploaded

Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
LINAT
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
janvikumar4133
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
Joel Ngushwai
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
tanupasswan6
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
kuldeepsharmaks8120
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
huseindihon
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
Jongwook Woo
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
transgenders community data in india by govt
transgenders community data in india by govttransgenders community data in india by govt
transgenders community data in india by govt
palanisamyiiiier
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
huseindihon
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 

Recently uploaded (20)

Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
transgenders community data in india by govt
transgenders community data in india by govttransgenders community data in india by govt
transgenders community data in india by govt
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 

Crossing the data divide

  • 1. www.scling.com Crossing the data divide Lars Albertsson, Founder, Scling Data Innovation Summit, 2021-10-14 1
  • 2. www.scling.com The great capability divide 2 1000x span in availability metrics Started 2002 / 2006, launched 2010, killed 2012 1000 person years, cost $125M Started 2009-05-10, launched 2009-05-16 $80M revenue in 15 months https://www.flickr.com/photos/downloadsourcefr/15944373702, CC BY 2.0 Pirate Bay founders' picture used without permission
  • 3. www.scling.com Efficiency gap, data cost & value ● Data processing produces datasets ○ Each dataset has business value ● Proxy value/cost metric: datasets / day ○ S-M traditional: < 10 ○ Bank, telecom, media: 10-1000 3 2014: 6500 datasets / day 2016: 20000 datasets / day 2017: 100B events collected / day 2018: 100000+ datasets / day, 25% of staff use BigQuery 2016: 1600 000 000 datasets / day Disruptive value of data, machine learning Financial, reporting Insights, data-fed features effort value
  • 4. www.scling.com ● Scaled processes ● Machine tools ● Challenges: scale, logistics, legal, organisation, faults, ... Manual, mechanised, industrialised 4 ● Muscle-powered ● Few tools ● Human touch for every step ● Direct human control ● Machine tools ● Low investment, direct return
  • 5. www.scling.com ● Hand-built models ● Manual deployment ● Spreadsheets Data artifacts: 100x 1000x ● Automated QA, monitoring ● Continuous deployment ● Hadoop ecosystem Manual, mechanised, industrialised 5 ● Automated training ● Semi-automated deployment ● Data warehouses, notebooks
  • 6. www.scling.com Road towards industrialisation 6 Data warehouse age - mechanised analytics DW LAMP stack age - manual analytics Hadoop age - industrialised analytics, data-fed features, machine learning Significant change in workflows Early Hadoop: ● Weak indexing ● No transactions ● Weak security ● Batch transformations
  • 7. www.scling.com Road back again 7 DW Enterprise big data failures Post-Hadoop "data engineering" - traditional workflows, new technology
  • 8. www.scling.com Gap is still there 8 DW Enterprise big data failures Post-Hadoop "data engineering" - traditional workflows, new technology ~10 year capability gap "data factory engineering" Current data eng focus - narrative, tools, vendors
  • 9. www.scling.com What conclusion from this graph? COVID-19 fatalities / day in Sweden 9
  • 10. www.scling.com What conclusion from this graph? COVID-19 fatalities / day in Sweden 10 Fatalities collected during 2 day Fatalities collected during 4 days Fatalities collected during 10 days
  • 11. www.scling.com Normalise data collection to compare 11 Graph by Adam Altmejd, @adamaltmejd
  • 12. www.scling.com Forecast for analytics with fresh data 12 Graph by Adam Altmejd, @adamaltmejd
  • 14. www.scling.com From craft to process 14 Multiple time windows Assess ingress data quality Repair broken data from complementary source Forecast based on history, multiple parameter settings Assess outcome data quality Assess forecast success, adapt parameters
  • 16. www.scling.com Sustainable production ML 16 Multiple models, parameters, features Assess ingress data quality Repair broken data from complementary source Choose model and parameters based on performance and input data Benchmark models Try multiple models, measure, A/B test
  • 17. www.scling.com Data engineering vs data factory engineering 17 How to organise How to work How to build
  • 18. www.scling.com Data factory engineering principles - technology 18 Centralised, homogeneous data platform Functional architecture Simple technology, simple rituals ● Minimal experiment friction ○ Centralise first to establish homogeneity ● Democratised functional data processing ○ Raw data + transforms ○ Immutable datasets!
  • 19. www.scling.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? 19
  • 20. www.scling.com Data platform Big data - a collaboration paradigm 20 Stream storage Data lake Data democratised
  • 21. www.scling.com Data factory engineering principles - architecture 21 Failure-driven design What happens, happens in production Fast feedback cycle, slow integration ● Batch processing is self healing ○ If you master workflow orchestration ● Low failure impact → high risk → fast cycle
  • 22. www.scling.com 22 Cost of a software error Nearline ● Data corruption ● Downstream impact ● Bounded recovery Offline ● Temporary data corruption ● Downstream impact ● Easy recovery Online ● User impact ● Data corruption ● Cascading corruption ● Unbounded recovery Job Stream Stream Job Stream
  • 23. www.scling.com Many nines uptime (99.99.. %) A couple of sevens Data speed Innovation speed 23 Nearline Data processing tradeoff Job Stream Offline Online Stream Job Stream
  • 24. www.scling.com Eliminate infrastructure waste 24 ● Production environment only ○ Dev, test, staging lack production data ● Dark pipelines ○ Run in parallel ○ Monitor diff vs production ○ Roll out slowly? ∆?
  • 25. www.scling.com Data factory engineering principles - engineering 25 It's a software engineering problem Continuous process improvement ● Quality, reproducibility, versioning, deployment, monitoring, rapid change? ○ Solved software engineering problems! ● Capable, unpolished components ○ Designed for strong processes, CI/CD, testing, observability ○ Ugly interfaces ● Statistical process control, engineered
  • 26. www.scling.com SQL is a power tool, not an industrial robot 26 ● No composition & abstractions ○ Hostile to testing ● Not expressive enough for mature data processing ● Hostile to data quality measurements and repair ○ Hadoop/Spark/Flink have quality primitives built in https://threadreaderapp.com/thread/1353832649664692225.html
  • 27. www.scling.com Data factory engineering principles - value iteration 27 Pull-driven work, initiated by business value needs Products, not projects Align along value flows ● Only business value counts ○ Drives work ○ Few teams along path ● Data is organic ○ Never done, always iterate
  • 28. www.scling.com Data factory engineering principles 28 Centralised, homogeneous data platform Functional architecture How to organise It's a software engineering problem Pull-driven work, initiated by business value needs Failure-driven design Simple technology, simple rituals What happens, happens in production Fast feedback cycle, slow integration Continuous process improvement Products, not projects How to work How to build Align along value flows
  • 29. www.scling.com Software factory engineering principles 29 Immutable images Agile Statistical process control Products DevOps Puppet, Ansible Waterfall In prod debugging Projects Dev + Ops High code Low code
  • 30. www.scling.com What should a company do? 30 ● Everything in-house ○ Works only for big tech ● Vendors - build, not buy ○ Works for families of use cases ○ So far a 10 year gap to tech elite ● Get consultants ○ No competence flow from European big tech to consultants ○ Products, not projects ● Long-term partnerships? ○ Common outside IT ○ Unfamiliar model in IT - cf. cloud resistance Autoliv general presentation 2017
  • 31. www.scling.com Scling - data-value-as-a-service 31 Data value through collaboration Customer Data factory Data platform & lake data domain expertise Value from data! Rapid data innovation Learning by doing, in collaboration