SlideShare a Scribd company logo
1 of 23
Download to read offline
www.scling.com
Industrialised data -
the key to AI success
Lars Albertsson, Founder, Scling
2024-04-25
1
www.scling.com
Myth:
● We are all doing quite ok
● 2-10x leader-to-rear span
The great capability divide
2
capability in X
# orgs
www.scling.com
Myth:
● We are all doing quite ok
● 2-10x leader-to-rear span
The great capability divide
3
capability in X
# orgs
capability in X
# orgs
Reality:
● Few leaders in each area
● 100-10000x leader-to-rear span
www.scling.com
Capability KPIs
DORA research / State of DevOps report:
● Deployment frequency
● Lead time for changes
● Change failure rate
● Time to restore service
Small elite
~1000x span
4
Observed differences in data organisations:
● Lead time from idea to production
● Time to mend / change pipeline
● Number of pipelines / developer
● Number of datasets / day / developer
Small elite
100 - 10000x span (or more)
www.scling.com
Efficiency gap, data cost & value
● Data processing produces datasets
○ Each dataset has business value
● Proxy value/cost metric: datasets / day
○ S-M traditional: < 10
○ Bank, telecom, media: 100-1000
5
2014: 6500 datasets / day
2016: 20000 datasets / day
2018: 100000+ datasets / day,
25% of staff use BigQuery
2021: 500B events collected / day
2016: 1600 000 000
datasets / day
Disruptive value of data, machine learning
Financial, reporting
Insights, data-fed features
effort
value
www.scling.com
Enabling innovation
6
"The actual work that went into
Discover Weekly was very little,
because we're reusing things we
already had."
https://youtu.be/A259Yo8hBRs
https://youtu.be/ZcmJxli8WS8
https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/
"Discover Weekly wasn't a great
strategic plan and 100 engineers.
It was 3 engineers that decided to
build something."
"I would have killed it. All of a sudden,
they shipped it. It’s one of the most
loved product features that we have."
- Daniel Ek, CEO
www.scling.com
Data not acted upon - value left on table
7
The car has sensors for precipitation,
temperature, and window state. I would
have liked to receive a mobile app
warning when all windows were down
during a snowy night.
www.scling.com
Data not acted upon - value left on table
8
Changing to winter tyres at Bilia. When I arrived,
the staff could see that mechanics were currently
not fully booked, and offered me a discounted
front wheel adjustment, which I accepted.
Data value taken from table.
Neighbour jumps in their Volvo, talks to husband
through window, and drives off. On arrival she
discovers that she has no car key - husband's key
was close enough to start car. An early warning
that key is no longer in proximity would have been
valuable.
I make the effort to submit a detailed bug report
regarding Volvo's known over-aggressive
automatic safety brake problem, in order to
provide more data for debugging. I receive no
feedback from the process, and lose interest.
Our Volvo incorrectly activates the automatic safety
brake again. This is a known problem since years
back, which affects our car occasionally. I have no
efficient channel to report time and circumstances
for this occasion, allowing Volvo to get more data
on the problem, and me to feel confident that the
issue is worked on.
After a repair at Bilia of the rear view mirror, the
car does not properly detect cars in front. Camera
near mirror is suspected. Bilia cannot obtain
camera measurements or car detection statistics
from Volvo, so a cycle of trial and error repairs
follows, taking me to the mechanic multiple times.
The car has sensors for precipitation,
temperature, and window state. I would
have liked to receive a mobile app
warning when all windows were down
during a snowy night.
I send a design suggestion to Volvo with pics of a
hacked solution for a luggage problem. A simple
hook could solve it. I do not receive "We have
assigned your suggestion id X, and will get back if
it is implemented." I will not send another, and feel
less connected to the brand.
Wife has driven our Volvo. Mirrors and seats
automatically move to my position, except for the
right mirror, which I need to correct again, as I
have for years. Volvo could have measured and
detected irregular changes of settings, and found
this bug.
Wife unlocks Volvo. Driver seat switches to her
position. I take the driver's seat, pull it back, and
change to my profile. Car switches seat back to
her position, and I have to pull it back again.
When seat heating was automatically activated,
"Increase seat heating" audio command turns it
off. I make additional commands to get to the
desired state. Measurements could identify
repeated audio or UX commands.
The Volvo phone app warns about many things:
interrupted charging, car parked but not being charged,
etc. But not about the roof and windows left open in the
rain. Rear view mirror electronics now need a repair.
I change the audio balance in the
Volvo with six screen clicks.
Measurements could indicate that
long click sequences at high speed
indicates poor user interface.
The car keeps informing me
that there is a software
upgrade, but no matter what I
do, nothing gets upgraded.
Map updates fail to auto
install. Support cannot obtain
diagnose information. Manual
installation attempts fail
without error messages.
www.scling.com
IT craft to factory
9
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
www.scling.com
Security Waterfall
Data factories
10
Application
delivery
Traditional
operations
DevSecOps
Traditional
QA
Infrastructure
DB-oriented
architecture
Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
From craft to process
11
www.scling.com
From craft to process
12
Multiple time windows
Assess ingress data quality
Repair broken data from
complementary source
Forecast based on history,
multiple parameter settings
Assess outcome data quality
Assess forecast success,
adapt parameters
www.scling.com
Naive machine learning
13
www.scling.com
Sustainable production machine learning
14
Multiple models,
parameters, features
Assess ingress data quality
Repair broken data from
complementary source
Choose model and parameters based
on performance and input data
Benchmark models
Try multiple models,
measure, A/B test
www.scling.com
Generative AI → more complex data engineering
15
Input
LLM
Output
Generative AI is simple?
Unless you want
● Relevance
● Correct facts
● Multiple modalities
● Prevent undesirable outcomes
www.scling.com
● Training an adapted large language model is expensive and difficult
● RAG hack:
○ Find relevant document chunks
○ Concatenate with prompt (context)
○ Combined general and specific information!
● How to search?
○ Keywords / embeddings / neural?
○ Rerank chunks?
● How to chunk documents?
○ Boundaries
○ Chunks need higher context
■ E.g. "These uses of X may cause fire: <snip> …"
Retrieval-augmented generation (RAG)
16
Input
LLM
Output
Search
engine
Document
corpus
Document
chunks
www.scling.com
RAG experiment: Household appliance interface
● Products that will be disrupted by generative AI?
○ How many different washing machine programs do you use?
○ What if you could talk to the machine? Show your stained shirt?
● Naive chunks:
"Saturate with prewash stain remover or nonflammable dry cleaning fluid. Baby formula, dairy
products, egg."
● Domain-specific data engineering divides generative AI products from demos.
17
www.scling.com
Generative AI → more complex data engineering
18
Input
LLM
Output
Generative AI is simple? Want multi-modal?
Correct facts too?
Safety mechanisms to
avoid undesirable use?
Add relevance? →
Retrieval augmented
generation (RAG)
Input
LLM
Output
Search engine Document
corpus
Document
snippets
Nvidia: Nemo Guardrails
Langchain: Multi-modal RAG
USTC + UCLA + Google: Corrective RAG
www.scling.com
Data engineering in the future
19
DW
~10 year capability gap
"data factory engineering"
Enterprise big data failures
"Modern data stack" -
traditional workflows, new technology
4GL / UML phase of data engineering
Data engineering education
www.scling.com
Don't build. Grow.
20
● Every data / AI project has either
○ failed
○ cost a fortune
● Every leading data company has
○ solved product challenges at hand
○ improved process / org / ways of working
○ had data / AI success through enabled teams
● Brief recipe for success
○ Align teams with value chains
○ Automation + data for current challenges
○ Centralised, immutable data in pipelines
○ It's a software engineering problem
■ QA, composability, DevOps, …
○ Feedback cycle speed < hour (day)
Technology
change
Process
change
Negative ROI
Positive ROI
Sustainable
data+AI feature
development
We must
be AI first!
Solve today's
challenges with
data & automation
www.scling.com
● Other disciplines?
○ Shared innovation
○ E.g. Foxconn, Autoliv
● Tech leaders
○ Grow, don't build
○ Product-driven
21
Data factory adoption
Observations:
● Industrial success requires veterans
● Not enough factory experience around
● Minimal flow of veterans to incumbents /
consultants
● Artisanal tools (Data warehouse / low-code / …)
inadequate for domain-specific AI
● Industrial tools not helpful without veterans
Belief:
We need new ways of
collaboration.
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Data innovation
as leaders do it
Learning by
doing, in
collaboration
www.scling.com
What we learnt
22
Success under good circumstances
● Match Spotify's numbers per developer
● 10-1000x client's own capability
● Data + AI innovation on par with leaders
Challenge: Data capability → innovation
● Domain-specific competence
● "Innovation building blocks" needed
○ Agile, pull vs push, iterative
○ Digital thinking
○ Aligned, cross-functional teams
○ Product focus
○ Value chain alignment
Greater challenge: "How hard can it be?"
● Unawareness of data divide
www.scling.com
What we learnt, what we need
23
Humble, but competent clients
● Data with value potential
● Domain experts
● Innovation building blocks
● Humble regarding data divide
Other ways to slice the challenge?
Partners that bridge innovation gap
● Digitalisation in some vertical
● Interested in new business models
Success under good circumstances
● Match Spotify's numbers per developer
● 10-1000x client's own capability
● Data + AI innovation on par with leaders
Challenge: Data capability → innovation
● Domain-specific competence
● "Innovation building blocks" needed
○ Agile, pull vs push, iterative
○ Digital thinking
○ Aligned, cross-functional teams
○ Product focus
○ Value chain alignment
Greater challenge: "How hard can it be?"
● Unawareness of data divide

More Related Content

Similar to Industrialised data - the key to AI success.pdf

The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data opsLars Albertsson
 
Imvac 2018 rev9_public_optimized
Imvac 2018 rev9_public_optimizedImvac 2018 rev9_public_optimized
Imvac 2018 rev9_public_optimizedPaulo Cipriano
 
Rockwell Automation TechED 2017 - AP08 - Givaudan
Rockwell Automation TechED 2017 - AP08 - GivaudanRockwell Automation TechED 2017 - AP08 - Givaudan
Rockwell Automation TechED 2017 - AP08 - GivaudanRockwell Automation
 
Using Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded SpaceUsing Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded SpaceOutlyer
 
Leveraging Cloud data to optimize your product decisions and Agile processes ...
Leveraging Cloud data to optimize your product decisions and Agile processes ...Leveraging Cloud data to optimize your product decisions and Agile processes ...
Leveraging Cloud data to optimize your product decisions and Agile processes ...AgileSparks
 
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...Jonas Rosland
 
How to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOpsHow to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOpsDynatrace
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetLars Albertsson
 
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례kosena
 
Enough Blame for System Performance Issues
Enough Blame for System Performance IssuesEnough Blame for System Performance Issues
Enough Blame for System Performance IssuesMahesh Vallampati
 
Building Beautiful High Performance Connected Car Applications
Building Beautiful High Performance Connected Car ApplicationsBuilding Beautiful High Performance Connected Car Applications
Building Beautiful High Performance Connected Car ApplicationsJason Wiener
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalDenodo
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Analytics on system z final
Analytics on system z finalAnalytics on system z final
Analytics on system z finalPeter Schouboe
 
Company presentation english 1 2015
Company presentation english 1 2015Company presentation english 1 2015
Company presentation english 1 2015Locanisag
 
Gl wand oracle baug may 2013
Gl wand oracle baug may 2013Gl wand oracle baug may 2013
Gl wand oracle baug may 2013Marianne Nørskov
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesLars Albertsson
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamalagogo6
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati
 

Similar to Industrialised data - the key to AI success.pdf (20)

The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Imvac 2018 rev9_public_optimized
Imvac 2018 rev9_public_optimizedImvac 2018 rev9_public_optimized
Imvac 2018 rev9_public_optimized
 
Rockwell Automation TechED 2017 - AP08 - Givaudan
Rockwell Automation TechED 2017 - AP08 - GivaudanRockwell Automation TechED 2017 - AP08 - Givaudan
Rockwell Automation TechED 2017 - AP08 - Givaudan
 
Using Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded SpaceUsing Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded Space
 
Leveraging Cloud data to optimize your product decisions and Agile processes ...
Leveraging Cloud data to optimize your product decisions and Agile processes ...Leveraging Cloud data to optimize your product decisions and Agile processes ...
Leveraging Cloud data to optimize your product decisions and Agile processes ...
 
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
 
How to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOpsHow to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOps
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
 
Enough Blame for System Performance Issues
Enough Blame for System Performance IssuesEnough Blame for System Performance Issues
Enough Blame for System Performance Issues
 
Building Beautiful High Performance Connected Car Applications
Building Beautiful High Performance Connected Car ApplicationsBuilding Beautiful High Performance Connected Car Applications
Building Beautiful High Performance Connected Car Applications
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Analytics on system z final
Analytics on system z finalAnalytics on system z final
Analytics on system z final
 
Company presentation english 1 2015
Company presentation english 1 2015Company presentation english 1 2015
Company presentation english 1 2015
 
Gl wand oracle baug may 2013
Gl wand oracle baug may 2013Gl wand oracle baug may 2013
Gl wand oracle baug may 2013
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamala
 
AltoWeb_SPEED_Overview-2001
AltoWeb_SPEED_Overview-2001AltoWeb_SPEED_Overview-2001
AltoWeb_SPEED_Overview-2001
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 

More from Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with ScalametaLars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfLars Albertsson
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift leftLars Albertsson
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityLars Albertsson
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processingLars Albertsson
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisisLars Albertsson
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipelineLars Albertsson
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platformLars Albertsson
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science teamLars Albertsson
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Lars Albertsson
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big dataLars Albertsson
 

More from Lars Albertsson (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Data democratised
Data democratisedData democratised
Data democratised
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 

Recently uploaded

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Industrialised data - the key to AI success.pdf

  • 1. www.scling.com Industrialised data - the key to AI success Lars Albertsson, Founder, Scling 2024-04-25 1
  • 2. www.scling.com Myth: ● We are all doing quite ok ● 2-10x leader-to-rear span The great capability divide 2 capability in X # orgs
  • 3. www.scling.com Myth: ● We are all doing quite ok ● 2-10x leader-to-rear span The great capability divide 3 capability in X # orgs capability in X # orgs Reality: ● Few leaders in each area ● 100-10000x leader-to-rear span
  • 4. www.scling.com Capability KPIs DORA research / State of DevOps report: ● Deployment frequency ● Lead time for changes ● Change failure rate ● Time to restore service Small elite ~1000x span 4 Observed differences in data organisations: ● Lead time from idea to production ● Time to mend / change pipeline ● Number of pipelines / developer ● Number of datasets / day / developer Small elite 100 - 10000x span (or more)
  • 5. www.scling.com Efficiency gap, data cost & value ● Data processing produces datasets ○ Each dataset has business value ● Proxy value/cost metric: datasets / day ○ S-M traditional: < 10 ○ Bank, telecom, media: 100-1000 5 2014: 6500 datasets / day 2016: 20000 datasets / day 2018: 100000+ datasets / day, 25% of staff use BigQuery 2021: 500B events collected / day 2016: 1600 000 000 datasets / day Disruptive value of data, machine learning Financial, reporting Insights, data-fed features effort value
  • 6. www.scling.com Enabling innovation 6 "The actual work that went into Discover Weekly was very little, because we're reusing things we already had." https://youtu.be/A259Yo8hBRs https://youtu.be/ZcmJxli8WS8 https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/ "Discover Weekly wasn't a great strategic plan and 100 engineers. It was 3 engineers that decided to build something." "I would have killed it. All of a sudden, they shipped it. It’s one of the most loved product features that we have." - Daniel Ek, CEO
  • 7. www.scling.com Data not acted upon - value left on table 7 The car has sensors for precipitation, temperature, and window state. I would have liked to receive a mobile app warning when all windows were down during a snowy night.
  • 8. www.scling.com Data not acted upon - value left on table 8 Changing to winter tyres at Bilia. When I arrived, the staff could see that mechanics were currently not fully booked, and offered me a discounted front wheel adjustment, which I accepted. Data value taken from table. Neighbour jumps in their Volvo, talks to husband through window, and drives off. On arrival she discovers that she has no car key - husband's key was close enough to start car. An early warning that key is no longer in proximity would have been valuable. I make the effort to submit a detailed bug report regarding Volvo's known over-aggressive automatic safety brake problem, in order to provide more data for debugging. I receive no feedback from the process, and lose interest. Our Volvo incorrectly activates the automatic safety brake again. This is a known problem since years back, which affects our car occasionally. I have no efficient channel to report time and circumstances for this occasion, allowing Volvo to get more data on the problem, and me to feel confident that the issue is worked on. After a repair at Bilia of the rear view mirror, the car does not properly detect cars in front. Camera near mirror is suspected. Bilia cannot obtain camera measurements or car detection statistics from Volvo, so a cycle of trial and error repairs follows, taking me to the mechanic multiple times. The car has sensors for precipitation, temperature, and window state. I would have liked to receive a mobile app warning when all windows were down during a snowy night. I send a design suggestion to Volvo with pics of a hacked solution for a luggage problem. A simple hook could solve it. I do not receive "We have assigned your suggestion id X, and will get back if it is implemented." I will not send another, and feel less connected to the brand. Wife has driven our Volvo. Mirrors and seats automatically move to my position, except for the right mirror, which I need to correct again, as I have for years. Volvo could have measured and detected irregular changes of settings, and found this bug. Wife unlocks Volvo. Driver seat switches to her position. I take the driver's seat, pull it back, and change to my profile. Car switches seat back to her position, and I have to pull it back again. When seat heating was automatically activated, "Increase seat heating" audio command turns it off. I make additional commands to get to the desired state. Measurements could identify repeated audio or UX commands. The Volvo phone app warns about many things: interrupted charging, car parked but not being charged, etc. But not about the roof and windows left open in the rain. Rear view mirror electronics now need a repair. I change the audio balance in the Volvo with six screen clicks. Measurements could indicate that long click sequences at high speed indicates poor user interface. The car keeps informing me that there is a software upgrade, but no matter what I do, nothing gets upgraded. Map updates fail to auto install. Support cannot obtain diagnose information. Manual installation attempts fail without error messages.
  • 9. www.scling.com IT craft to factory 9 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  • 12. www.scling.com From craft to process 12 Multiple time windows Assess ingress data quality Repair broken data from complementary source Forecast based on history, multiple parameter settings Assess outcome data quality Assess forecast success, adapt parameters
  • 14. www.scling.com Sustainable production machine learning 14 Multiple models, parameters, features Assess ingress data quality Repair broken data from complementary source Choose model and parameters based on performance and input data Benchmark models Try multiple models, measure, A/B test
  • 15. www.scling.com Generative AI → more complex data engineering 15 Input LLM Output Generative AI is simple? Unless you want ● Relevance ● Correct facts ● Multiple modalities ● Prevent undesirable outcomes
  • 16. www.scling.com ● Training an adapted large language model is expensive and difficult ● RAG hack: ○ Find relevant document chunks ○ Concatenate with prompt (context) ○ Combined general and specific information! ● How to search? ○ Keywords / embeddings / neural? ○ Rerank chunks? ● How to chunk documents? ○ Boundaries ○ Chunks need higher context ■ E.g. "These uses of X may cause fire: <snip> …" Retrieval-augmented generation (RAG) 16 Input LLM Output Search engine Document corpus Document chunks
  • 17. www.scling.com RAG experiment: Household appliance interface ● Products that will be disrupted by generative AI? ○ How many different washing machine programs do you use? ○ What if you could talk to the machine? Show your stained shirt? ● Naive chunks: "Saturate with prewash stain remover or nonflammable dry cleaning fluid. Baby formula, dairy products, egg." ● Domain-specific data engineering divides generative AI products from demos. 17
  • 18. www.scling.com Generative AI → more complex data engineering 18 Input LLM Output Generative AI is simple? Want multi-modal? Correct facts too? Safety mechanisms to avoid undesirable use? Add relevance? → Retrieval augmented generation (RAG) Input LLM Output Search engine Document corpus Document snippets Nvidia: Nemo Guardrails Langchain: Multi-modal RAG USTC + UCLA + Google: Corrective RAG
  • 19. www.scling.com Data engineering in the future 19 DW ~10 year capability gap "data factory engineering" Enterprise big data failures "Modern data stack" - traditional workflows, new technology 4GL / UML phase of data engineering Data engineering education
  • 20. www.scling.com Don't build. Grow. 20 ● Every data / AI project has either ○ failed ○ cost a fortune ● Every leading data company has ○ solved product challenges at hand ○ improved process / org / ways of working ○ had data / AI success through enabled teams ● Brief recipe for success ○ Align teams with value chains ○ Automation + data for current challenges ○ Centralised, immutable data in pipelines ○ It's a software engineering problem ■ QA, composability, DevOps, … ○ Feedback cycle speed < hour (day) Technology change Process change Negative ROI Positive ROI Sustainable data+AI feature development We must be AI first! Solve today's challenges with data & automation
  • 21. www.scling.com ● Other disciplines? ○ Shared innovation ○ E.g. Foxconn, Autoliv ● Tech leaders ○ Grow, don't build ○ Product-driven 21 Data factory adoption Observations: ● Industrial success requires veterans ● Not enough factory experience around ● Minimal flow of veterans to incumbents / consultants ● Artisanal tools (Data warehouse / low-code / …) inadequate for domain-specific AI ● Industrial tools not helpful without veterans Belief: We need new ways of collaboration. Customer Data factory Data platform & lake data domain expertise Value from data! Data innovation as leaders do it Learning by doing, in collaboration
  • 22. www.scling.com What we learnt 22 Success under good circumstances ● Match Spotify's numbers per developer ● 10-1000x client's own capability ● Data + AI innovation on par with leaders Challenge: Data capability → innovation ● Domain-specific competence ● "Innovation building blocks" needed ○ Agile, pull vs push, iterative ○ Digital thinking ○ Aligned, cross-functional teams ○ Product focus ○ Value chain alignment Greater challenge: "How hard can it be?" ● Unawareness of data divide
  • 23. www.scling.com What we learnt, what we need 23 Humble, but competent clients ● Data with value potential ● Domain experts ● Innovation building blocks ● Humble regarding data divide Other ways to slice the challenge? Partners that bridge innovation gap ● Digitalisation in some vertical ● Interested in new business models Success under good circumstances ● Match Spotify's numbers per developer ● 10-1000x client's own capability ● Data + AI innovation on par with leaders Challenge: Data capability → innovation ● Domain-specific competence ● "Innovation building blocks" needed ○ Agile, pull vs push, iterative ○ Digital thinking ○ Aligned, cross-functional teams ○ Product focus ○ Value chain alignment Greater challenge: "How hard can it be?" ● Unawareness of data divide