Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Organising for
Data Success
Lars Albertsson
Data Architect, Schibsted Media Group
Bio
● SICS - test and debug technology for distributed
systems
● Sun - high-end server verification
● Google - Hangouts, e...
Path to profit
Big data path to profit
You start out simple
User behaviour
Things get complex
User
content
Professional
content
Ads
User
behaviour
Systems
Ads
System
diagnostics
Recommendations
Dat...
Presentation objectives
Conway’s law
“Organizations which design systems ... are
constrained to produce designs which are
copies of the communicat...
Startup mode
CollectionIngestion
Cold
store Batch process Analytics
Ingestion
Big corp future
CollectionIngestion
Cold
store
Batch process(Real-time process)
Analytics ReportsPushingFeatures...
Don’t drop it - make it one team’s focus
Reliable path source -> cold store
Minimal complexity
Human & machine fault toler...
Form teams that are driven by business cases & need
Forward-oriented -> filters implicitly applied
Beware of: duplication,...
Data platform, pipeline chains
Common data infrastructure
Productivity, privacy, end-to-end agility, complexity
Beware: pr...
Example case: Spotify
~50M active users, 5-10 TB/day, 20PB
100-200 people touch data daily
Autonomous team and tech cultur...
Example case: Schibsted Prod & Tech
10-200M users, 5+TB/day, 0-1PB
Blocket, Aftonbladet, Leboncoin, Finn, VG, ...
Grew 1-1...
Survival utilities, technology
Heed ecosystem direction
Follow leaders
Twitter, LinkedIn, Facebook, AirBnB, Netflix
Techno...
Survival utilities, ingestion
Data owners should export data
Difficult, needs attention
Pull database/API from Hadoop/Spar...
Survival utilities, workflow
Productive workflow from day one
Upstream easily breaks downstream
No off-the-shelf tools
Pri...
Parting words
1. Keep things simple
2. Don’t drop data
3. Focus on productive developer workflows
4. Choose right componen...
Bonus slides
Personae - important characteristics
Architect
- Technology updated
- Holistic: productivity, privacy
- Identify and facil...
+ Operations
+ Security
+ Responsive scaling
- Development workflows
- Privacy
- Vendor lock-in
Cloud or not?
Upcoming SlideShare
Loading in …5
×

Organising for Data Success

310 views

Published on

Advice for building a successful data processing organisation, from an engineering perspective.

Published in: Data & Analytics
  • Be the first to comment

Organising for Data Success

  1. 1. Organising for Data Success Lars Albertsson Data Architect, Schibsted Media Group
  2. 2. Bio ● SICS - test and debug technology for distributed systems ● Sun - high-end server verification ● Google - Hangouts, engineering productivity ● Recorded Future - data ingestion, data quality ● Cinnober - stock exchange engines ● Spotify - data processing, music data modelling ● Schibsted Media Group - data architect
  3. 3. Path to profit
  4. 4. Big data path to profit
  5. 5. You start out simple User behaviour
  6. 6. Things get complex User content Professional content Ads User behaviour Systems Ads System diagnostics Recommendations Data-based features Curated content Pushing Business intelligence Experiments Exploration
  7. 7. Presentation objectives
  8. 8. Conway’s law “Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.” Better organise to match desired design, then.
  9. 9. Startup mode CollectionIngestion Cold store Batch process Analytics
  10. 10. Ingestion Big corp future CollectionIngestion Cold store Batch process(Real-time process) Analytics ReportsPushingFeatures Legacy DBs User hidden - agilityUser visible - robustness
  11. 11. Don’t drop it - make it one team’s focus Reliable path source -> cold store Minimal complexity Human & machine fault tolerance Data is gold CollectionIngestion Cold store Batch process Analytics
  12. 12. Form teams that are driven by business cases & need Forward-oriented -> filters implicitly applied Beware of: duplication, tech chaos/autonomy, privacy loss Data pipelines
  13. 13. Data platform, pipeline chains Common data infrastructure Productivity, privacy, end-to-end agility, complexity Beware: producer-consumer disconnect
  14. 14. Example case: Spotify ~50M active users, 5-10 TB/day, 20PB 100-200 people touch data daily Autonomous team and tech culture Stabilising data platform + Business-driven pipes, enabled teams - Productivity, end-to-end agility, privacy, stability, duplication, security Morning coffee =
  15. 15. Example case: Schibsted Prod & Tech 10-200M users, 5+TB/day, 0-1PB Blocket, Aftonbladet, Leboncoin, Finn, VG, ... Grew 1-100 people in 1 year, 20 touch data Big corp culture, governance Fast-forwarded to platform stage, reverted to autonomy + Privacy, security, modern high-level components - Productivity, stability, forward-driven, dependent teams
  16. 16. Survival utilities, technology Heed ecosystem direction Follow leaders Twitter, LinkedIn, Facebook, AirBnB, Netflix Technology has no overlap with yesterday’s Keep up
  17. 17. Survival utilities, ingestion Data owners should export data Difficult, needs attention Pull database/API from Hadoop/Spark = DDoS Quickly hand off incoming data to reliable storage Measure loss and latency
  18. 18. Survival utilities, workflow Productive workflow from day one Upstream easily breaks downstream No off-the-shelf tools Privacy strategy from day one Data spreads like weed Expect machine and human error Capability to rebuild from cold store
  19. 19. Parting words 1. Keep things simple 2. Don’t drop data 3. Focus on productive developer workflows 4. Choose right components Open source is safer Avoid rolling your own
  20. 20. Bonus slides
  21. 21. Personae - important characteristics Architect - Technology updated - Holistic: productivity, privacy - Identify and facilitate governance Backend developer - Simplicity oriented - Engineering practices obsessed - Adapt to data world Product owner - Trace business value to upstream design - Find most ROI through difficult questions Manager - Explain what and why - Facilitate process to determine how - Enable, enable, enable Devops - Always increase automation - Enable, don’t control Data scientist - Capable programmer - Product oriented
  22. 22. + Operations + Security + Responsive scaling - Development workflows - Privacy - Vendor lock-in Cloud or not?

×