Successfully reported this slideshow.
Your SlideShare is downloading. ×

Organising for Data Success

Upcoming SlideShare
Data pipelines from zero
Data pipelines from zero
Loading in …3

Check these out next

1 of 22 Ad
1 of 22 Ad

More Related Content

Slideshows for you (19)

Similar to Organising for Data Success (20)


Organising for Data Success

  1. 1. Organising for Data Success Lars Albertsson Data Architect, Schibsted Media Group
  2. 2. Bio ● SICS - test and debug technology for distributed systems ● Sun - high-end server verification ● Google - Hangouts, engineering productivity ● Recorded Future - data ingestion, data quality ● Cinnober - stock exchange engines ● Spotify - data processing, music data modelling ● Schibsted Media Group - data architect
  3. 3. Path to profit
  4. 4. Big data path to profit
  5. 5. You start out simple User behaviour
  6. 6. Things get complex User content Professional content Ads User behaviour Systems Ads System diagnostics Recommendations Data-based features Curated content Pushing Business intelligence Experiments Exploration
  7. 7. Presentation objectives
  8. 8. Conway’s law “Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.” Better organise to match desired design, then.
  9. 9. Startup mode CollectionIngestion Cold store Batch process Analytics
  10. 10. Ingestion Big corp future CollectionIngestion Cold store Batch process(Real-time process) Analytics ReportsPushingFeatures Legacy DBs User hidden - agilityUser visible - robustness
  11. 11. Don’t drop it - make it one team’s focus Reliable path source -> cold store Minimal complexity Human & machine fault tolerance Data is gold CollectionIngestion Cold store Batch process Analytics
  12. 12. Form teams that are driven by business cases & need Forward-oriented -> filters implicitly applied Beware of: duplication, tech chaos/autonomy, privacy loss Data pipelines
  13. 13. Data platform, pipeline chains Common data infrastructure Productivity, privacy, end-to-end agility, complexity Beware: producer-consumer disconnect
  14. 14. Example case: Spotify ~50M active users, 5-10 TB/day, 20PB 100-200 people touch data daily Autonomous team and tech culture Stabilising data platform + Business-driven pipes, enabled teams - Productivity, end-to-end agility, privacy, stability, duplication, security Morning coffee =
  15. 15. Example case: Schibsted Prod & Tech 10-200M users, 5+TB/day, 0-1PB Blocket, Aftonbladet, Leboncoin, Finn, VG, ... Grew 1-100 people in 1 year, 20 touch data Big corp culture, governance Fast-forwarded to platform stage, reverted to autonomy + Privacy, security, modern high-level components - Productivity, stability, forward-driven, dependent teams
  16. 16. Survival utilities, technology Heed ecosystem direction Follow leaders Twitter, LinkedIn, Facebook, AirBnB, Netflix Technology has no overlap with yesterday’s Keep up
  17. 17. Survival utilities, ingestion Data owners should export data Difficult, needs attention Pull database/API from Hadoop/Spark = DDoS Quickly hand off incoming data to reliable storage Measure loss and latency
  18. 18. Survival utilities, workflow Productive workflow from day one Upstream easily breaks downstream No off-the-shelf tools Privacy strategy from day one Data spreads like weed Expect machine and human error Capability to rebuild from cold store
  19. 19. Parting words 1. Keep things simple 2. Don’t drop data 3. Focus on productive developer workflows 4. Choose right components Open source is safer Avoid rolling your own
  20. 20. Bonus slides
  21. 21. Personae - important characteristics Architect - Technology updated - Holistic: productivity, privacy - Identify and facilitate governance Backend developer - Simplicity oriented - Engineering practices obsessed - Adapt to data world Product owner - Trace business value to upstream design - Find most ROI through difficult questions Manager - Explain what and why - Facilitate process to determine how - Enable, enable, enable Devops - Always increase automation - Enable, don’t control Data scientist - Capable programmer - Product oriented
  22. 22. + Operations + Security + Responsive scaling - Development workflows - Privacy - Vendor lock-in Cloud or not?