Organising for Data Success

Lars Albertsson
Lars AlbertssonFounder & Data Engineer
Organising for
Data Success
Lars Albertsson
Data Architect, Schibsted Media Group
Bio
● SICS - test and debug technology for distributed
systems
● Sun - high-end server verification
● Google - Hangouts, engineering productivity
● Recorded Future - data ingestion, data quality
● Cinnober - stock exchange engines
● Spotify - data processing, music data modelling
● Schibsted Media Group - data architect
Path to profit
Big data path to profit
You start out simple
User behaviour
Things get complex
User
content
Professional
content
Ads
User
behaviour
Systems
Ads
System
diagnostics
Recommendations
Data-based
features
Curated
content
Pushing
Business
intelligence
Experiments
Exploration
Presentation objectives
Conway’s law
“Organizations which design systems ... are
constrained to produce designs which are
copies of the communication structures of
these organizations.”
Better organise to match desired design, then.
Startup mode
CollectionIngestion
Cold
store Batch process Analytics
Ingestion
Big corp future
CollectionIngestion
Cold
store
Batch process(Real-time process)
Analytics ReportsPushingFeatures
Legacy
DBs
User hidden - agilityUser visible - robustness
Don’t drop it - make it one team’s focus
Reliable path source -> cold store
Minimal complexity
Human & machine fault tolerance
Data is gold
CollectionIngestion
Cold
store Batch process Analytics
Form teams that are driven by business cases & need
Forward-oriented -> filters implicitly applied
Beware of: duplication, tech chaos/autonomy, privacy loss
Data pipelines
Data platform, pipeline chains
Common data infrastructure
Productivity, privacy, end-to-end agility, complexity
Beware: producer-consumer disconnect
Example case: Spotify
~50M active users, 5-10 TB/day, 20PB
100-200 people touch data daily
Autonomous team and tech culture
Stabilising data platform
+ Business-driven pipes, enabled
teams
- Productivity, end-to-end agility,
privacy, stability, duplication, security
Morning
coffee
=
Example case: Schibsted Prod & Tech
10-200M users, 5+TB/day, 0-1PB
Blocket, Aftonbladet, Leboncoin, Finn, VG, ...
Grew 1-100 people in 1 year, 20 touch data
Big corp culture, governance
Fast-forwarded to platform stage, reverted to autonomy
+ Privacy, security, modern high-level components
- Productivity, stability, forward-driven, dependent teams
Survival utilities, technology
Heed ecosystem direction
Follow leaders
Twitter, LinkedIn, Facebook, AirBnB, Netflix
Technology has no overlap with yesterday’s
Keep up
Survival utilities, ingestion
Data owners should export data
Difficult, needs attention
Pull database/API from Hadoop/Spark = DDoS
Quickly hand off incoming data to reliable storage
Measure loss and latency
Survival utilities, workflow
Productive workflow from day one
Upstream easily breaks downstream
No off-the-shelf tools
Privacy strategy from day one
Data spreads like weed
Expect machine and human error
Capability to rebuild from cold store
Parting words
1. Keep things simple
2. Don’t drop data
3. Focus on productive developer workflows
4. Choose right components
Open source is safer
Avoid rolling your own
Bonus slides
Personae - important characteristics
Architect
- Technology updated
- Holistic: productivity, privacy
- Identify and facilitate governance
Backend developer
- Simplicity oriented
- Engineering practices obsessed
- Adapt to data world
Product owner
- Trace business value to
upstream design
- Find most ROI through difficult
questions
Manager
- Explain what and why
- Facilitate process to determine how
- Enable, enable, enable
Devops
- Always increase automation
- Enable, don’t control
Data scientist
- Capable programmer
- Product oriented
+ Operations
+ Security
+ Responsive scaling
- Development workflows
- Privacy
- Vendor lock-in
Cloud or not?
1 of 22

Recommended

Data pipelines from zero by
Data pipelines from zero Data pipelines from zero
Data pipelines from zero Lars Albertsson
973 views19 slides
Data pipelines from zero to solid by
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
10.7K views58 slides
Functional architectural patterns by
Functional architectural patternsFunctional architectural patterns
Functional architectural patternsLars Albertsson
1.7K views35 slides
Data Infrastructure for a World of Music by
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of MusicLars Albertsson
1.5K views37 slides
10 ways to stumble with big data by
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big dataLars Albertsson
1.4K views18 slides
Big data real time architectures by
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
4.5K views16 slides

More Related Content

What's hot

Big Data Streams Architectures. Why? What? How? by
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
871 views45 slides
Continuous delivery for machine learning by
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
2.9K views42 slides
Kubernetes as data platform by
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platformLars Albertsson
884 views34 slides
Machine learning at scale challenges and solutions by
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
455 views48 slides
Data Pipline Observability meetup by
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
371 views24 slides
Engineering data quality by
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
1.3K views50 slides

What's hot(20)

Big Data Streams Architectures. Why? What? How? by Anton Nazaruk
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk871 views
Continuous delivery for machine learning by Rajesh Muppalla
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
Rajesh Muppalla2.9K views
Machine learning at scale challenges and solutions by Stavros Kontopoulos
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Data Pipline Observability meetup by Omid Vahdaty
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty371 views
Building Better Analytics Workflows (Strata-Hadoop World 2013) by Wes McKinney
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Wes McKinney82.7K views
Open source big data landscape and possible ITS applications by SoftwareMill
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
SoftwareMill1.3K views
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ... by Dataconomy Media
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
Dataconomy Media838 views
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark... by Modern Data Stack France
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server by Mark Kromer
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Mark Kromer1.4K views
ODI11g, Hadoop and "Big Data" Sources by Mark Rittman
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" Sources
Mark Rittman7.3K views
Introduction to basic data analytics tools by Nascenia IT
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
Nascenia IT650 views
TopNotch: Systematically Quality Controlling Big Data by David Durst by Spark Summit
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
Spark Summit1.6K views
Building Scalable Big Data Pipelines by Christian Gügi
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
Christian Gügi8K views
Small intro to Big Data - Old version by SoftwareMill
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old version
SoftwareMill1.1K views
Efficiently Building Machine Learning Models for Predictive Maintenance in th... by Databricks
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks583 views
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016 by Dan Lynn
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn457 views

Similar to Organising for Data Success

Big data beyond the hype may 2014 by
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014bigdatagurus_meetup
151 views32 slides
Data estate modernization feb webinar 2 18 2020 by
Data estate modernization   feb webinar 2 18 2020Data estate modernization   feb webinar 2 18 2020
Data estate modernization feb webinar 2 18 2020Matthew W. Bowers
133 views37 slides
How Can Analytics Improve Business? by
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
617 views50 slides
Big Data - A Real Life Revolution by
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
8.1K views24 slides
How Hewlett Packard Enterprise Gets Real with IoT Analytics by
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsArcadia Data
244 views30 slides
Big data unit 2 by
Big data unit 2Big data unit 2
Big data unit 2RojaT4
377 views29 slides

Similar to Organising for Data Success(20)

Data estate modernization feb webinar 2 18 2020 by Matthew W. Bowers
Data estate modernization   feb webinar 2 18 2020Data estate modernization   feb webinar 2 18 2020
Data estate modernization feb webinar 2 18 2020
Matthew W. Bowers133 views
How Can Analytics Improve Business? by Inside Analysis
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis617 views
Big Data - A Real Life Revolution by Capgemini
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
Capgemini8.1K views
How Hewlett Packard Enterprise Gets Real with IoT Analytics by Arcadia Data
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data244 views
Big data unit 2 by RojaT4
Big data unit 2Big data unit 2
Big data unit 2
RojaT4377 views
BAR360 open data platform presentation at DAMA, Sydney by Sai Paravastu
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu166 views
Big data4businessusers by Bob Hardaway
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway961 views
Using real time big data analytics for competitive advantage by Amazon Web Services
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
Amazon Web Services2.3K views
Oracle Modern Information Management Platform - v1.0 by Bratamay Majumder
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0
Bratamay Majumder278 views
Exploring the Wider World of Big Data- Vasalis Kapsalis by NetAppUK
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
NetAppUK1.3K views
Overview - IBM Big Data Platform by Vikas Manoria
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria21.7K views
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios by kcmallu
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu2K views
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1) by Vishal Bamba
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Vishal Bamba989 views
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ... by Precisely
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Precisely246 views
Exploring the Wider World of Big Data by NetApp
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
NetApp292 views
Getting Started with Data Virtualization – What problems DV solves by Denodo
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
Denodo 1.2K views

More from Lars Albertsson

Crossing the data divide by
Crossing the data divideCrossing the data divide
Crossing the data divideLars Albertsson
3 views31 slides
Schema management with Scalameta by
Schema management with ScalametaSchema management with Scalameta
Schema management with ScalametaLars Albertsson
7 views50 slides
How to not kill people - Berlin Buzzwords 2023.pdf by
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfLars Albertsson
34 views51 slides
Data engineering in 10 years.pdf by
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
841 views52 slides
The 7 habits of data effective companies.pdf by
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfLars Albertsson
252 views44 slides
Holistic data application quality by
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
396 views30 slides

More from Lars Albertsson(20)

How to not kill people - Berlin Buzzwords 2023.pdf by Lars Albertsson
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
Lars Albertsson34 views
Data engineering in 10 years.pdf by Lars Albertsson
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
Lars Albertsson841 views
The 7 habits of data effective companies.pdf by Lars Albertsson
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
Lars Albertsson252 views
Holistic data application quality by Lars Albertsson
Holistic data application qualityHolistic data application quality
Holistic data application quality
Lars Albertsson396 views
Secure software supply chain on a shoestring budget by Lars Albertsson
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
Lars Albertsson268 views
DataOps - Lean principles and lean practices by Lars Albertsson
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
Lars Albertsson787 views
The right side of speed - learning to shift left by Lars Albertsson
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
Lars Albertsson202 views
Mortal analytics - Covid-19 and the problem of data quality by Lars Albertsson
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
Lars Albertsson416 views
Data ops in practice - Swedish style by Lars Albertsson
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
Lars Albertsson408 views
Eventually, time will kill your data processing by Lars Albertsson
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
Lars Albertsson413 views
Taming the reproducibility crisis by Lars Albertsson
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
Lars Albertsson521 views
Eventually, time will kill your data pipeline by Lars Albertsson
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
Lars Albertsson936 views

Recently uploaded

LIVE OAK MEMORIAL PARK.pptx by
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptxms2332always
7 views6 slides
shivam tiwari.pptx by
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptxAanyaMishra4
5 views14 slides
Ukraine Infographic_22NOV2023_v2.pdf by
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.4K views3 slides
Data Journeys Hard Talk workshop final.pptx by
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptxinfo828217
11 views18 slides
Lack of communication among family.pptx by
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptxahmed164023
14 views10 slides
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf by
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf10urkyr34
7 views259 slides

Recently uploaded(20)

LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
Lack of communication among family.pptx by ahmed164023
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptx
ahmed16402314 views
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf by 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr347 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0120 views
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx by DataScienceConferenc1
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
Product Research sample.pdf by AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
4_4_WP_4_06_ND_Model.pptx by d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 views
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... by DataScienceConferenc1
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821714 views
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...

Organising for Data Success

  • 1. Organising for Data Success Lars Albertsson Data Architect, Schibsted Media Group
  • 2. Bio ● SICS - test and debug technology for distributed systems ● Sun - high-end server verification ● Google - Hangouts, engineering productivity ● Recorded Future - data ingestion, data quality ● Cinnober - stock exchange engines ● Spotify - data processing, music data modelling ● Schibsted Media Group - data architect
  • 4. Big data path to profit
  • 5. You start out simple User behaviour
  • 8. Conway’s law “Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.” Better organise to match desired design, then.
  • 10. Ingestion Big corp future CollectionIngestion Cold store Batch process(Real-time process) Analytics ReportsPushingFeatures Legacy DBs User hidden - agilityUser visible - robustness
  • 11. Don’t drop it - make it one team’s focus Reliable path source -> cold store Minimal complexity Human & machine fault tolerance Data is gold CollectionIngestion Cold store Batch process Analytics
  • 12. Form teams that are driven by business cases & need Forward-oriented -> filters implicitly applied Beware of: duplication, tech chaos/autonomy, privacy loss Data pipelines
  • 13. Data platform, pipeline chains Common data infrastructure Productivity, privacy, end-to-end agility, complexity Beware: producer-consumer disconnect
  • 14. Example case: Spotify ~50M active users, 5-10 TB/day, 20PB 100-200 people touch data daily Autonomous team and tech culture Stabilising data platform + Business-driven pipes, enabled teams - Productivity, end-to-end agility, privacy, stability, duplication, security Morning coffee =
  • 15. Example case: Schibsted Prod & Tech 10-200M users, 5+TB/day, 0-1PB Blocket, Aftonbladet, Leboncoin, Finn, VG, ... Grew 1-100 people in 1 year, 20 touch data Big corp culture, governance Fast-forwarded to platform stage, reverted to autonomy + Privacy, security, modern high-level components - Productivity, stability, forward-driven, dependent teams
  • 16. Survival utilities, technology Heed ecosystem direction Follow leaders Twitter, LinkedIn, Facebook, AirBnB, Netflix Technology has no overlap with yesterday’s Keep up
  • 17. Survival utilities, ingestion Data owners should export data Difficult, needs attention Pull database/API from Hadoop/Spark = DDoS Quickly hand off incoming data to reliable storage Measure loss and latency
  • 18. Survival utilities, workflow Productive workflow from day one Upstream easily breaks downstream No off-the-shelf tools Privacy strategy from day one Data spreads like weed Expect machine and human error Capability to rebuild from cold store
  • 19. Parting words 1. Keep things simple 2. Don’t drop data 3. Focus on productive developer workflows 4. Choose right components Open source is safer Avoid rolling your own
  • 21. Personae - important characteristics Architect - Technology updated - Holistic: productivity, privacy - Identify and facilitate governance Backend developer - Simplicity oriented - Engineering practices obsessed - Adapt to data world Product owner - Trace business value to upstream design - Find most ROI through difficult questions Manager - Explain what and why - Facilitate process to determine how - Enable, enable, enable Devops - Always increase automation - Enable, don’t control Data scientist - Capable programmer - Product oriented
  • 22. + Operations + Security + Responsive scaling - Development workflows - Privacy - Vendor lock-in Cloud or not?