Big data == lean data

Lars Albertsson
Lars AlbertssonFounder & Data Engineer
www.mimeria.com
Big Data == Lean Data
Agila Sverige, 2018-05-30
Lars Albertsson
www.mapflat.com, www.mimeria.com
1
www.mimeria.com
Service-oriented architectures
● Services own data
● Heterogeneous coupling
2
Service Service Service
App App App
Poll
Aggregate
logs
NFS
Hourly dump
Data
warehouse
ETL
Queue
Queue
NFS
scp
DB
HTTP
DB DBDB
www.mimeria.com
● Teams own services
Service-oriented organisations
3
www.mimeria.com
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ extraction?
○ data governance?
○ history?
Data-centric innovation
4
www.mimeria.com
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ extraction?
○ data governance?
○ history?
Innovation value stream mapping
5
www.mimeria.com
Enter Big Data
● What is Big Data?
6
AI magic
Clusters
Weird technology
?
Spoiled developers
www.mimeria.com
A collaboration paradigm
7
Stream storage
Data lake
Data
democratised
www.mimeria.com
Onboard driven by use case
8
Data lake
www.mimeria.com
Data platform == collaboration platform
9
Data lake
www.mimeria.com
Balance of success
10
Data lake
Balance planning & architecture
● Homogeneity
● Governance
● Coordination
with business value driven activities
www.mimeria.com
Coupling by design
11
Data lake
● Coordination >> autonomy
● Homogeneity >> heterogeneity
www.mimeria.com
Data agility
12
Data lake
● Siloed: 6+ months
● Autonomous: 1 month
● Coordinated: days
∆
∆
Latency?
www.mimeria.com
A journey of learning
13
Data lake
● End-to-end == feedback, value
● Scale == cost
● All data now == waterfall
www.mimeria.com
End-to-end >> scale
14
AI magic
Clusters
Weird technology
?
Workflow orchestration
Proof of value
!
www.mimeria.com
Can I have AI now?
15
● Crawl, walk, run
AI
Deep learning
A/B testing
Machine learning
Analytics
Segments
Curation
Anomaly detection
Data infrastructure
Pipelines
Instrumentation
Data collection
Credits: “The data science hierarchy of needs”,
Monica Rogati
www.mimeria.com
A journey of many years
16
● Simple == max value
○ Reporting
○ Forecasting, risk
○ User notification
● AI first == waterfall
AI
Deep learning
A/B testing
Machine learning
Analytics
Segments
Curation
Anomaly detection
Data infrastructure
Pipelines
Instrumentation
Data collection
Value Effort
Credits: “The data science hierarchy of needs”,
Monica Rogati
www.mimeria.com
Who's talking?
17
Lars Albertsson
Mapflat - independent consultant
Mimeria - data-value-as-a-service
AI
Deep learning
A/B testing
Machine learning
Analytics
Segments
Curation
Anomaly detection
Data infrastructure
Pipelines
Instrumentation
Data collection
Credits: “The data science hierarchy of needs”,
Monica Rogati
1 of 17

More Related Content

Similar to Big data == lean data(20)

Don't build a data science teamDon't build a data science team
Don't build a data science team
Lars Albertsson883 views
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Karthik Murugesan1.7K views
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
Tapdata60 views
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
Koltiva1.3K views
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
Jean-Claude Sotto77 views
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili654 views

More from Lars Albertsson(20)

Crossing the data divideCrossing the data divide
Crossing the data divide
Lars Albertsson3 views
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
Lars Albertsson5 views
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
Lars Albertsson839 views
Holistic data application qualityHolistic data application quality
Holistic data application quality
Lars Albertsson396 views
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
Lars Albertsson199 views
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
Lars Albertsson408 views
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
Lars Albertsson410 views
Data democratisedData democratised
Data democratised
Lars Albertsson307 views
Engineering data qualityEngineering data quality
Engineering data quality
Lars Albertsson1.3K views
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
Lars Albertsson521 views
Data ops in practiceData ops in practice
Data ops in practice
Lars Albertsson3K views
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
Lars Albertsson884 views

Recently uploaded(20)

Big data == lean data