SlideShare a Scribd company logo
www.scling.com
The 7 habits of
data effective companies
Lars Albertsson, Founder, Scling
2022-11-08
1
www.scling.com
Digital maturity phases
Analog Software enabled Information-based Born-digital
Cafés Elderly care Construction Cars Banks Telecom Spotify FAANG
Digital information
foundation exists
Disruptive value
from data
● More value from data (+ AI) to the right
● What differs between information stage vs born-digital?
● Presentation scope: Observed differences & underlying dynamics
2
Analog Software-enabled Information-based Born-digital
www.scling.com
Observed differences
● Impact
○ Cost
○ Value extraction
○ Capability
● 7 habits
○ Data factory - from craft to industrial process
○ Lean process - focus on latency and waste
○ Use case driven - value stream aligned
○ Centralise first
○ Architect for failure and sharing
○ It's a software engineering problem
○ Unix philosophy
3
www.scling.com
● From raw material to refined artifact that is appreciated by someone
○ Raw, collected data → data artifact (report, recommendation, …)
○ Customer with a need → sales transaction
○ Employee with an idea → product making a user happy
● Intermediate artifacts produced as side-effect
○ Indirect, partial value
Value creation streams
4
www.scling.com
Efficiency gap, data cost & value
● Data processing produces datasets
○ Each dataset has business value
● Proxy value/cost metric: datasets / day
○ S-M traditional: < 10
○ Bank, telecom, media: 100-1000
5
2014: 6500 datasets / day
2016: 20000 datasets / day
2018: 100000+ datasets / day,
25% of staff use BigQuery
2021: 500B events collected / day
2016: 1600 000 000
datasets / day
Disruptive value of data, machine learning
Financial, reporting
Insights, data-fed features
effort
value
www.scling.com
Data value cost - master data management example
MDM = single view of important entity
● Large bank
○ View of all customers
○ ~millions of records
○ Proprietary vendor MDM tool
○ 5 person team
● Spotify
○ View of all users
○ ~100 millions of records
○ Hadoop MapReduce pipeline
○ ~10% of full-time engineer
6
www.scling.com
What conclusion from this graph?
COVID-19 fatalities / day in Sweden
7
www.scling.com
What conclusion from this graph?
COVID-19 fatalities / day in Sweden
8
Fatalities collected during 2 day
Fatalities collected during 4 days
Fatalities collected during 10 days
www.scling.com
Forecast for analytics with fresh data
9
Graph by Adam Altmejd, @adamaltmejd
www.scling.com
and sometimes reach them.
Scandinavian companies reach for the stars
10
www.scling.com
Data not acted upon - value left on table
11
Changing to winter tyres at Bilia. When I arrived,
the staff could see that mechanics were currently
not fully booked, and offered me a front wheel
adjustment, which I accepted. Data value taken
from table.
Neighbour jumps in their Volvo, talks to husband
through window, and drives off. On arrival she
discovers that she has no car key - husband's key
was close enough to start car. An early warning
that key is no longer in proximity would have been
valuable.
I make the effort to submit a detailed bug report
regarding Volvo's known over-aggressive
automatic safety brake problem, in order to
provide more data for debugging. I receive no
feedback from the process, and lose interest.
Our Volvo incorrectly activates the automatic safety
brake again. This is a known problem since years
back, which affects our car occasionally. I have no
efficient channel to report time and circumstances
for this occasion, allowing Volvo to get more data
on the problem, and me to feel confident that the
issue is worked on.
After a repair at Bilia of the rear view mirror, the
car does not properly detect cars in front. Camera
near mirror is suspected. Bilia cannot obtain
camera measurements or car detection statistics
from Volvo, so a cycle of trial and error repairs
follows, taking me to the mechanic multiple times.
The Volvo phone app warns about
many things: interrupted charging, car
parked but not being charged, etc. But
not about the roof and windows left
open in the rain. Rear view mirror
electronics now need a repair.
I change the audio balance in the
Volvo with six screen clicks.
Measurements could indicate that
long click sequences at high speed
indicates poor user interface.
I send a design suggestion to Volvo with pics of a
hacked solution for a luggage problem. A simple
hook could solve it. I do not receive "We have
assigned your suggestion id X, and will get back if
it is implemented." I will not send another, and feel
less connected to the brand.
Wife has driven our Volvo. Mirrors and seats
automatically move to my position, except for the
right mirror, which I need to correct again, as I
have for years. Volvo could have measured and
detected irregular changes of settings, and found
this bug.
Wife unlocks Volvo. Driver seat switches to her
position. I take the driver's seat, pull it back, and
change to my profile. Car switches seat back to
her position, and I have to pull it back again.
When seat heating was automatically activated,
"Increase seat heating" audio command turns it
off. I make additional commands to get to the
desired state. Measurements could identify
repeated audio or UX commands.
www.scling.com
Enabling innovation
12
"The actual work that went into
Discover Weekly was very little,
because we're reusing things we
already had."
https://youtu.be/A259Yo8hBRs
https://youtu.be/ZcmJxli8WS8
https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/
"Discover Weekly wasn't a great
strategic plan and 100 engineers.
It was 3 engineers that decided to
build something."
"I would have killed it. All of a sudden,
they shipped it. It’s one of the most
loved product features that we have."
- Daniel Ek, CEO
www.scling.com
Security Waterfall
1. Data factory - from craft to industrial process
13
Application
delivery
Traditional
operations
DevSecOps
Traditional
QA
Infrastructure
DB-oriented
architecture
Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
From craft to process
14
www.scling.com
From craft to process
15
Multiple time windows
Assess ingress data quality
Repair broken data from
complementary source
Forecast based on history,
multiple parameter settings
Assess outcome data quality
Assess forecast success,
adapt parameters
www.scling.com
Sustainable production machine learning
16
Multiple models,
parameters, features
Assess ingress data quality
Repair broken data from
complementary source
Choose model and parameters based
on performance and input data
Benchmark models
Try multiple models,
measure, A/B test
www.scling.com
2. Lean factories - focus on latency & waste
17
Value-stream mapping:
vs
Primary waste forms in data value streams: "Put first things first"
● Big design / modelling / architecture up front
● Redundant technical environments / components
● Human coordination
● Rules and processes no longer needed
www.scling.com
Eliminating delivery friction
18
● In theory simple - scrutinise everything
○ Positive engineering: writing code, tests, docs, refactor, improve
○ All else is negative
● You are limited by your assumptions
○ State of practice far from state of art
But the test suite
takes 3 hours.
We have this
checklist.
Security must
approve.
X must be
released before Y.
That is another
team's job.
We don't have
access.
We must test in
staging first.
We haven't
performance
tested yet.
www.scling.com
3. Use case driven, value stream aligned
● Stream-aligned teams
○ Direct business value impact
○ Few handoffs
● Platform & specialist teams serve stream-aligned teams
○ Mechanism for business value to set priority
"Begin with the end in mind"
19
www.scling.com
Value-driven teams
20
Raw
Commitment iff
onboarding
Waterfall, one team delivers to next
- Time to business value, low agility
www.scling.com
Cross functional teams
21
● Team has competence for end-to-end value creation
○ Data engineering (software, QA, DevOps)
○ Subject matter expertise
○ Product management
● Most value in technically simple things
○ Counting things
○ Data availability
○ Fancy things and machine learning - higher cost, lower ROI
www.scling.com
Push-driven vs pull-driven
22
● Technology capabilities
○ In-house
○ Vendors
● Ivory tower platforms
● Modelling up front
● Work driven by business stakeholder needs
● Teams self-sufficient to deliver value
● Common services driven by internal stakeholders
www.scling.com
Push-driven vs pull-driven
23
● Technology capabilities
○ In-house
○ Vendors
● Ivory tower platforms
● Modelling up front
● Work driven by business stakeholder needs
● Teams self-sufficient to deliver value
● Common services driven by internal stakeholders
Started 2002 / 2006,
launched 2010,
killed 2012
1000 person years,
cost $125M
Started 2009-05-10,
launched 2009-05-16
$80M revenue in 15 months
https://www.flickr.com/photos/downloadsourcefr/15944373702, CC BY 2.0
www.scling.com
4. Centralise first
● Decentralisation is a scaling decision
○ At a high cost in friction
● Most successful data lakes centralised
until very large
○ 1000+ node clusters
○ 10000+ jobs / day
○ 100000+ datasets / day
● Premature decentralisation harmful
○ Without strong homogeneity → silos
○ The data mesh is not for you
24
www.scling.com
Data-centric innovation
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ quality?
○ extraction?
○ data governance?
○ history?
25
www.scling.com
Data platform
Big data - a collaboration paradigm
26
Stream storage?
Data lake
Data
democratised
www.scling.com
5. Architect for failure and sharing
● Data processing is fragile
○ Spans across domains
○ Spans across time
○ Immature, heterogeneous components
○ Few engineers / value flow
● Accept high failure probability, reduce impact
○ Functional architecture
○ Offline - online separation
○ Versioned data for fallback
27
www.scling.com
Data lake
Transformation
Cold
pond
Data factories
28
Mutation
Immutable,
shareable
www.scling.com
Online vs offline
29
Online Offline
ETL
Cold
pond
Online
Data feature
Dataset
Job
Pipeline
Service
Service
www.scling.com
OO vs functional architecture
REST DAO RDBMS Stream processing Data lake
OO λ
ORM Document DB Data warehouse Lakehouse Frozen lake
30
www.scling.com
● Forgiving environment
○ Machine errors
○ Human errors
● Friendly to experiments
○ "Dark pipelines" run in parallel
● Operationally efficient
○ Separate dev, test, staging environments not necessary
○ Self-healing
Raw, immutable data + transformation process
31
∆?
www.scling.com
Life of an error, microservices
32
● My service, bad code!
○ Bad data spreads instantly
● Different recovery strategies - ad-hoc
○ No standard patterns
○ No guaranteed recovery
● High cost of error
○ Proactive QA - staging environment
Service
Data
warehouse
ETL
Queue
DB
www.scling.com
Life of an error, batch pipelines
33
● My processing job, bad code!
1. Revert serving datasets to old
2. Fix bug
3. Remove faulty datasets
4. Deploy
5. Backfill is automatic (Luigi)
Done!
● Low cost of error
○ Reactive QA
○ Production environment sufficient
www.scling.com
Life of an error, frozen lake
34
● My processing job, bad code!
1. Revert serving datasets to old
2. Fix bug
3. Bump pipeline version
4. Deploy
5. Backfill is automatic (Luigi)
Done!
● Low cost of error
○ Reactive QA
○ Production environment sufficient
www.scling.com 35
Life of an error, streaming
● Works for a single job, not pipeline. :-(
Job
Stream
Stream Stream
Stream Stream Stream
Job
Job
Stream Stream Stream
Job
Job Job
Reprocessing in Kafka Streams
www.scling.com
6. It's a software engineering problem
● Data value can be tackled from different angles
○ Strategy
○ Data science
○ Technology
○ Engineering
● Things happen when it becomes a software engineering problem
○ Engineering = building internal artifacts for value extraction
○ New ways of working - manifest in processes
○ New ways of collaborating - manifest in shared structures
36
www.scling.com
Process engineering
● Idea - deploy - feedback loop
○ Find and understand code & data
■ Absence of walls
○ Continuous integration & deployment
■ Integration == integration
● Automate toil
○ Pull-driven
○ Carrots > whips - "Golden path" platform service
● Continuous improvement
○ Visualise friction
○ Feedback & priority loops between teams
○ Sharpen the saw
37
www.scling.com
● Functional QA
○ Feedback on code
○ Testing challenge
○ Guard rail for development speed
● Data product QA
○ Feedback on code + data
○ Monitoring & observability challenge
● High quality demands high code
○ Low/no-code incompatible with SWE processes for speed & quality
○ SQL lacks convenient support for data quality measurements and improvements
Quality assurance is a fundament for speed
38
15 minutes!
www.scling.com
Big vendor Few wide scope components Many small scope components
Tied to vendor evolution Perpetual integration work Amortised integration work
● Successful companies tend to own the mortar
7. Unix philosophy - do one thing well
39
www.scling.com
One-way vs two-way decisions
40
● Architect for two-way decisions
○ Small, replaceable components
○ Non-destructive computations
○ Sharing without risk
● One way: privacy architecture
Expensive
to reverse
Easy to
reverse
www.scling.com
One-way vs two-way decisions
41
● Architect for two-way decisions
○ Small, replaceable components
○ Non-destructive computations
○ Sharing without risk
● One way: privacy architecture
Expensive
to reverse
Easy to
reverse
● Data pipeline deployment evolution
○ Spotify 2014 - 2018
1. Self-contained jar file. Ad-hoc continuous deploy flow.
2. Docker container on VM pool
3. Docker container on Kubernetes
logs
CI
dev
env
o11y
www.scling.com
8th habit - find your voice
● Your business use cases define your needs
○ Not technology
○ Not vendors
○ Not trends
○ Not fear-of-missing-out on AI, blockchains, …
● How do non-big-tech companies find own path and data success?
○ Technology?
○ Consultants?
○ Grow organically?
■ Very little data success experience available on job market
○ New modes of collaboration?
42
www.scling.com
Scling - data-factory-as-a-service
43
Data value through collaboration
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Rapid data
innovation
Learning by doing,
in collaboration
www.scling.com
Tech has massive impact on society
44
Product?
Supplier?
Employer?
Make an active
choice whether to
have an impact!
Cloud?

More Related Content

What's hot

Process Automation: an Update from the Trenches
Process Automation: an Update from the TrenchesProcess Automation: an Update from the Trenches
Process Automation: an Update from the Trenches
Kris Verlaenen
 
The Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog TrifectaThe Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog Trifecta
georgefirican
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmapvictorlbrown
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
ConfluentInc1
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Kai Wähner
 
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
HostedbyConfluent
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture Strategies
DATAVERSITY
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
Christopher Bradley
 
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
IDERA Software
 
Process Oriented Architecture
Process Oriented ArchitectureProcess Oriented Architecture
Process Oriented Architecture
Alan McSweeney
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
Amazon Web Services
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
Kent Graziano
 
Data Center PUE Reconsidered
Data Center PUE Reconsidered Data Center PUE Reconsidered
Data Center PUE Reconsidered Raritan
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
Databricks
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
Databricks
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Treasure Data, Inc.
 
OpenText ApplicationXtender -the basics
OpenText ApplicationXtender -the basicsOpenText ApplicationXtender -the basics
OpenText ApplicationXtender -the basics
Christopher Wynder
 
Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!
DATAVERSITY
 

What's hot (20)

Process Automation: an Update from the Trenches
Process Automation: an Update from the TrenchesProcess Automation: an Update from the Trenches
Process Automation: an Update from the Trenches
 
The Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog TrifectaThe Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog Trifecta
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
 
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture Strategies
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
 
Process Oriented Architecture
Process Oriented ArchitectureProcess Oriented Architecture
Process Oriented Architecture
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Data Center PUE Reconsidered
Data Center PUE Reconsidered Data Center PUE Reconsidered
Data Center PUE Reconsidered
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
OpenText ApplicationXtender -the basics
OpenText ApplicationXtender -the basicsOpenText ApplicationXtender -the basics
OpenText ApplicationXtender -the basics
 
Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!
 

Similar to The 7 habits of data effective companies.pdf

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
Lars Albertsson
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
Lars Albertsson
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
BigDataExpo
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
Lars Albertsson
 
Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...
Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...
Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...
Peak Hosting
 
Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009Capgemini
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
Lars Albertsson
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark
Databricks
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
Lars Albertsson
 
Agile software architecture
Agile software architectureAgile software architecture
Agile software architecture
Scott Hsieh
 
Yhat - Applied Data Science - Feb 2016
Yhat - Applied Data Science - Feb 2016Yhat - Applied Data Science - Feb 2016
Yhat - Applied Data Science - Feb 2016
Austin Ogilvie
 
Automating Screenshot Testing Component Library
Automating Screenshot Testing Component LibraryAutomating Screenshot Testing Component Library
Automating Screenshot Testing Component Library
Applitools
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
Denodo
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
Lars Albertsson
 
Cms Solution 07162010
Cms Solution 07162010Cms Solution 07162010
Cms Solution 07162010
larrybaker90
 
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
kosena
 
Agile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul Jadhav
Agile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul JadhavAgile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul Jadhav
Agile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul Jadhav
AgileNetwork
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
Sri Ambati
 

Similar to The 7 habits of data effective companies.pdf (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...
Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...
Webinar | So You Think You Know the Cloud: Hosting Alternatives You May Not K...
 
Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009Fifth Edition Architecture Week @Gothenburg 141009
Fifth Edition Architecture Week @Gothenburg 141009
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Agile software architecture
Agile software architectureAgile software architecture
Agile software architecture
 
Cms solution 08072010
Cms solution 08072010Cms solution 08072010
Cms solution 08072010
 
Yhat - Applied Data Science - Feb 2016
Yhat - Applied Data Science - Feb 2016Yhat - Applied Data Science - Feb 2016
Yhat - Applied Data Science - Feb 2016
 
Automating Screenshot Testing Component Library
Automating Screenshot Testing Component LibraryAutomating Screenshot Testing Component Library
Automating Screenshot Testing Component Library
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Cms Solution 07162010
Cms Solution 07162010Cms Solution 07162010
Cms Solution 07162010
 
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
 
Agile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul Jadhav
Agile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul JadhavAgile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul Jadhav
Agile Network India | Industry 4.0 - Digital Agility Vade Mecum | Atul Jadhav
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 

More from Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
Lars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
Lars Albertsson
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
Lars Albertsson
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
Lars Albertsson
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
Lars Albertsson
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
Lars Albertsson
 
Data democratised
Data democratisedData democratised
Data democratised
Lars Albertsson
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
Lars Albertsson
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
Lars Albertsson
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
Lars Albertsson
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
Lars Albertsson
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
Lars Albertsson
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
Lars Albertsson
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
Lars Albertsson
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
Lars Albertsson
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
Lars Albertsson
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
Lars Albertsson
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
Lars Albertsson
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
Lars Albertsson
 

More from Lars Albertsson (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Data democratised
Data democratisedData democratised
Data democratised
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 

The 7 habits of data effective companies.pdf

  • 1. www.scling.com The 7 habits of data effective companies Lars Albertsson, Founder, Scling 2022-11-08 1
  • 2. www.scling.com Digital maturity phases Analog Software enabled Information-based Born-digital Cafés Elderly care Construction Cars Banks Telecom Spotify FAANG Digital information foundation exists Disruptive value from data ● More value from data (+ AI) to the right ● What differs between information stage vs born-digital? ● Presentation scope: Observed differences & underlying dynamics 2 Analog Software-enabled Information-based Born-digital
  • 3. www.scling.com Observed differences ● Impact ○ Cost ○ Value extraction ○ Capability ● 7 habits ○ Data factory - from craft to industrial process ○ Lean process - focus on latency and waste ○ Use case driven - value stream aligned ○ Centralise first ○ Architect for failure and sharing ○ It's a software engineering problem ○ Unix philosophy 3
  • 4. www.scling.com ● From raw material to refined artifact that is appreciated by someone ○ Raw, collected data → data artifact (report, recommendation, …) ○ Customer with a need → sales transaction ○ Employee with an idea → product making a user happy ● Intermediate artifacts produced as side-effect ○ Indirect, partial value Value creation streams 4
  • 5. www.scling.com Efficiency gap, data cost & value ● Data processing produces datasets ○ Each dataset has business value ● Proxy value/cost metric: datasets / day ○ S-M traditional: < 10 ○ Bank, telecom, media: 100-1000 5 2014: 6500 datasets / day 2016: 20000 datasets / day 2018: 100000+ datasets / day, 25% of staff use BigQuery 2021: 500B events collected / day 2016: 1600 000 000 datasets / day Disruptive value of data, machine learning Financial, reporting Insights, data-fed features effort value
  • 6. www.scling.com Data value cost - master data management example MDM = single view of important entity ● Large bank ○ View of all customers ○ ~millions of records ○ Proprietary vendor MDM tool ○ 5 person team ● Spotify ○ View of all users ○ ~100 millions of records ○ Hadoop MapReduce pipeline ○ ~10% of full-time engineer 6
  • 7. www.scling.com What conclusion from this graph? COVID-19 fatalities / day in Sweden 7
  • 8. www.scling.com What conclusion from this graph? COVID-19 fatalities / day in Sweden 8 Fatalities collected during 2 day Fatalities collected during 4 days Fatalities collected during 10 days
  • 9. www.scling.com Forecast for analytics with fresh data 9 Graph by Adam Altmejd, @adamaltmejd
  • 10. www.scling.com and sometimes reach them. Scandinavian companies reach for the stars 10
  • 11. www.scling.com Data not acted upon - value left on table 11 Changing to winter tyres at Bilia. When I arrived, the staff could see that mechanics were currently not fully booked, and offered me a front wheel adjustment, which I accepted. Data value taken from table. Neighbour jumps in their Volvo, talks to husband through window, and drives off. On arrival she discovers that she has no car key - husband's key was close enough to start car. An early warning that key is no longer in proximity would have been valuable. I make the effort to submit a detailed bug report regarding Volvo's known over-aggressive automatic safety brake problem, in order to provide more data for debugging. I receive no feedback from the process, and lose interest. Our Volvo incorrectly activates the automatic safety brake again. This is a known problem since years back, which affects our car occasionally. I have no efficient channel to report time and circumstances for this occasion, allowing Volvo to get more data on the problem, and me to feel confident that the issue is worked on. After a repair at Bilia of the rear view mirror, the car does not properly detect cars in front. Camera near mirror is suspected. Bilia cannot obtain camera measurements or car detection statistics from Volvo, so a cycle of trial and error repairs follows, taking me to the mechanic multiple times. The Volvo phone app warns about many things: interrupted charging, car parked but not being charged, etc. But not about the roof and windows left open in the rain. Rear view mirror electronics now need a repair. I change the audio balance in the Volvo with six screen clicks. Measurements could indicate that long click sequences at high speed indicates poor user interface. I send a design suggestion to Volvo with pics of a hacked solution for a luggage problem. A simple hook could solve it. I do not receive "We have assigned your suggestion id X, and will get back if it is implemented." I will not send another, and feel less connected to the brand. Wife has driven our Volvo. Mirrors and seats automatically move to my position, except for the right mirror, which I need to correct again, as I have for years. Volvo could have measured and detected irregular changes of settings, and found this bug. Wife unlocks Volvo. Driver seat switches to her position. I take the driver's seat, pull it back, and change to my profile. Car switches seat back to her position, and I have to pull it back again. When seat heating was automatically activated, "Increase seat heating" audio command turns it off. I make additional commands to get to the desired state. Measurements could identify repeated audio or UX commands.
  • 12. www.scling.com Enabling innovation 12 "The actual work that went into Discover Weekly was very little, because we're reusing things we already had." https://youtu.be/A259Yo8hBRs https://youtu.be/ZcmJxli8WS8 https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/ "Discover Weekly wasn't a great strategic plan and 100 engineers. It was 3 engineers that decided to build something." "I would have killed it. All of a sudden, they shipped it. It’s one of the most loved product features that we have." - Daniel Ek, CEO
  • 13. www.scling.com Security Waterfall 1. Data factory - from craft to industrial process 13 Application delivery Traditional operations DevSecOps Traditional QA Infrastructure DB-oriented architecture Agile Containers DevOps CI/CD Infrastructure as code Data factories, data pipelines, DataOps
  • 15. www.scling.com From craft to process 15 Multiple time windows Assess ingress data quality Repair broken data from complementary source Forecast based on history, multiple parameter settings Assess outcome data quality Assess forecast success, adapt parameters
  • 16. www.scling.com Sustainable production machine learning 16 Multiple models, parameters, features Assess ingress data quality Repair broken data from complementary source Choose model and parameters based on performance and input data Benchmark models Try multiple models, measure, A/B test
  • 17. www.scling.com 2. Lean factories - focus on latency & waste 17 Value-stream mapping: vs Primary waste forms in data value streams: "Put first things first" ● Big design / modelling / architecture up front ● Redundant technical environments / components ● Human coordination ● Rules and processes no longer needed
  • 18. www.scling.com Eliminating delivery friction 18 ● In theory simple - scrutinise everything ○ Positive engineering: writing code, tests, docs, refactor, improve ○ All else is negative ● You are limited by your assumptions ○ State of practice far from state of art But the test suite takes 3 hours. We have this checklist. Security must approve. X must be released before Y. That is another team's job. We don't have access. We must test in staging first. We haven't performance tested yet.
  • 19. www.scling.com 3. Use case driven, value stream aligned ● Stream-aligned teams ○ Direct business value impact ○ Few handoffs ● Platform & specialist teams serve stream-aligned teams ○ Mechanism for business value to set priority "Begin with the end in mind" 19
  • 20. www.scling.com Value-driven teams 20 Raw Commitment iff onboarding Waterfall, one team delivers to next - Time to business value, low agility
  • 21. www.scling.com Cross functional teams 21 ● Team has competence for end-to-end value creation ○ Data engineering (software, QA, DevOps) ○ Subject matter expertise ○ Product management ● Most value in technically simple things ○ Counting things ○ Data availability ○ Fancy things and machine learning - higher cost, lower ROI
  • 22. www.scling.com Push-driven vs pull-driven 22 ● Technology capabilities ○ In-house ○ Vendors ● Ivory tower platforms ● Modelling up front ● Work driven by business stakeholder needs ● Teams self-sufficient to deliver value ● Common services driven by internal stakeholders
  • 23. www.scling.com Push-driven vs pull-driven 23 ● Technology capabilities ○ In-house ○ Vendors ● Ivory tower platforms ● Modelling up front ● Work driven by business stakeholder needs ● Teams self-sufficient to deliver value ● Common services driven by internal stakeholders Started 2002 / 2006, launched 2010, killed 2012 1000 person years, cost $125M Started 2009-05-10, launched 2009-05-16 $80M revenue in 15 months https://www.flickr.com/photos/downloadsourcefr/15944373702, CC BY 2.0
  • 24. www.scling.com 4. Centralise first ● Decentralisation is a scaling decision ○ At a high cost in friction ● Most successful data lakes centralised until very large ○ 1000+ node clusters ○ 10000+ jobs / day ○ 100000+ datasets / day ● Premature decentralisation harmful ○ Without strong homogeneity → silos ○ The data mesh is not for you 24
  • 25. www.scling.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? 25
  • 26. www.scling.com Data platform Big data - a collaboration paradigm 26 Stream storage? Data lake Data democratised
  • 27. www.scling.com 5. Architect for failure and sharing ● Data processing is fragile ○ Spans across domains ○ Spans across time ○ Immature, heterogeneous components ○ Few engineers / value flow ● Accept high failure probability, reduce impact ○ Functional architecture ○ Offline - online separation ○ Versioned data for fallback 27
  • 29. www.scling.com Online vs offline 29 Online Offline ETL Cold pond Online Data feature Dataset Job Pipeline Service Service
  • 30. www.scling.com OO vs functional architecture REST DAO RDBMS Stream processing Data lake OO λ ORM Document DB Data warehouse Lakehouse Frozen lake 30
  • 31. www.scling.com ● Forgiving environment ○ Machine errors ○ Human errors ● Friendly to experiments ○ "Dark pipelines" run in parallel ● Operationally efficient ○ Separate dev, test, staging environments not necessary ○ Self-healing Raw, immutable data + transformation process 31 ∆?
  • 32. www.scling.com Life of an error, microservices 32 ● My service, bad code! ○ Bad data spreads instantly ● Different recovery strategies - ad-hoc ○ No standard patterns ○ No guaranteed recovery ● High cost of error ○ Proactive QA - staging environment Service Data warehouse ETL Queue DB
  • 33. www.scling.com Life of an error, batch pipelines 33 ● My processing job, bad code! 1. Revert serving datasets to old 2. Fix bug 3. Remove faulty datasets 4. Deploy 5. Backfill is automatic (Luigi) Done! ● Low cost of error ○ Reactive QA ○ Production environment sufficient
  • 34. www.scling.com Life of an error, frozen lake 34 ● My processing job, bad code! 1. Revert serving datasets to old 2. Fix bug 3. Bump pipeline version 4. Deploy 5. Backfill is automatic (Luigi) Done! ● Low cost of error ○ Reactive QA ○ Production environment sufficient
  • 35. www.scling.com 35 Life of an error, streaming ● Works for a single job, not pipeline. :-( Job Stream Stream Stream Stream Stream Stream Job Job Stream Stream Stream Job Job Job Reprocessing in Kafka Streams
  • 36. www.scling.com 6. It's a software engineering problem ● Data value can be tackled from different angles ○ Strategy ○ Data science ○ Technology ○ Engineering ● Things happen when it becomes a software engineering problem ○ Engineering = building internal artifacts for value extraction ○ New ways of working - manifest in processes ○ New ways of collaborating - manifest in shared structures 36
  • 37. www.scling.com Process engineering ● Idea - deploy - feedback loop ○ Find and understand code & data ■ Absence of walls ○ Continuous integration & deployment ■ Integration == integration ● Automate toil ○ Pull-driven ○ Carrots > whips - "Golden path" platform service ● Continuous improvement ○ Visualise friction ○ Feedback & priority loops between teams ○ Sharpen the saw 37
  • 38. www.scling.com ● Functional QA ○ Feedback on code ○ Testing challenge ○ Guard rail for development speed ● Data product QA ○ Feedback on code + data ○ Monitoring & observability challenge ● High quality demands high code ○ Low/no-code incompatible with SWE processes for speed & quality ○ SQL lacks convenient support for data quality measurements and improvements Quality assurance is a fundament for speed 38 15 minutes!
  • 39. www.scling.com Big vendor Few wide scope components Many small scope components Tied to vendor evolution Perpetual integration work Amortised integration work ● Successful companies tend to own the mortar 7. Unix philosophy - do one thing well 39
  • 40. www.scling.com One-way vs two-way decisions 40 ● Architect for two-way decisions ○ Small, replaceable components ○ Non-destructive computations ○ Sharing without risk ● One way: privacy architecture Expensive to reverse Easy to reverse
  • 41. www.scling.com One-way vs two-way decisions 41 ● Architect for two-way decisions ○ Small, replaceable components ○ Non-destructive computations ○ Sharing without risk ● One way: privacy architecture Expensive to reverse Easy to reverse ● Data pipeline deployment evolution ○ Spotify 2014 - 2018 1. Self-contained jar file. Ad-hoc continuous deploy flow. 2. Docker container on VM pool 3. Docker container on Kubernetes logs CI dev env o11y
  • 42. www.scling.com 8th habit - find your voice ● Your business use cases define your needs ○ Not technology ○ Not vendors ○ Not trends ○ Not fear-of-missing-out on AI, blockchains, … ● How do non-big-tech companies find own path and data success? ○ Technology? ○ Consultants? ○ Grow organically? ■ Very little data success experience available on job market ○ New modes of collaboration? 42
  • 43. www.scling.com Scling - data-factory-as-a-service 43 Data value through collaboration Customer Data factory Data platform & lake data domain expertise Value from data! Rapid data innovation Learning by doing, in collaboration
  • 44. www.scling.com Tech has massive impact on society 44 Product? Supplier? Employer? Make an active choice whether to have an impact! Cloud?