Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Netflix viewing data architecture evolution - QCon 2014

14,908 views

Published on

Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (from talk given at QConSF 2014)

Published in: Technology
  • Be the first to comment

Netflix viewing data architecture evolution - QCon 2014

  1. 1. Please read the notes associated with each slide for the full context of the presentation
  2. 2. Who am I? Philip Fisher-Ogden • Director of Engineering @ Netflix • Playback Services (making “click play” work) • 6 years @ Netflix, from 10 servers to 10,000s
  3. 3. Story Netflix streaming – 2007 to present
  4. 4. Device Growth 2007 1 device 2008 10s of devices 2009 10s of devices 2010 100s of devices 2011+ 1000+ devices
  5. 5. Experience Evolution
  6. 6. Subscribers & Viewing 53M global subscribers 50 countries >2 billion hours viewed per month
  7. 7. Improved Personalization Better Experience Viewing Virtuous Cycle
  8. 8. Viewing Data Who, What, When, Where, How Long
  9. 9. Real time data use cases What have I watched?
  10. 10. Real time data use cases Where was I at?
  11. 11. Real time data use cases What else am I watching?
  12. 12. Session Analytics
  13. 13. Session Analytics
  14. 14. Active Sessions Last Position Viewing History Data Feed Generic Architecture Start Stop Collect Process Stream State Session Summary Event Stream Provide
  15. 15. Architecture Evolution • Different generations • Pain points & learnings • Re-architecture motivations
  16. 16. Real Time Data 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  17. 17. Real Time Data – gen 1 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  18. 18. Real Time Data – gen 1 Start Stop Sessions Logs / Events History / Position SQL
  19. 19. Real Time Data – gen 1 pain points • Scalability – DB scaled up not out • Event Data Analytics – ad hoc • Fixed schema
  20. 20. Real Time Data – gen 2 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  21. 21. Real Time Data – gen 2 motivations • Scalability – Scale out not up • Flexible schema – Key/value attributes • Service oriented
  22. 22. Real Time Data – gen 2 Start Stop NoSQL 50 data partitions Viewing Service
  23. 23. Real Time Data – gen 2 pain points • Scale out – Resharding was painful • Performance – Hot spots • Disaster Recovery – SimpleDB had no backups
  24. 24. Real Time Data – gen 3 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  25. 25. Real Time Data – gen 3 landscape • Cassandra 0.6 • Before SSDs in AWS • Netflix in 1 AWS region
  26. 26. Real Time Data – gen 3 motivations • Order of magnitude increase in requests • Scalability – Actually scale out rather than up
  27. 27. Real Time Data – gen 3 ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Stateless Tier (fallback) Sessions Viewing History Memcached
  28. 28. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Start Stop
  29. 29. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop
  30. 30. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop update
  31. 31. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop snapshot Sessions
  32. 32. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop Viewing History Memcached
  33. 33. Real Time Data – gen 3 reads ViewingService Stateful Tier What have I watched? Viewing History Memcached View Summary
  34. 34. Real Time Data – gen 3 reads ViewingService Stateful Tier Latest Positions Where was I at? Viewing History Stateless Tier (fallback) Memcached
  35. 35. Real Time Data – gen 3 reads ViewingService Stateful Tier What else am I watching? Active Sessions
  36. 36. gen 3 - Requests Scale Operation Scale Create (start streaming) 1,000s per second Update (heartbeat, close) 100,000s per second Append (session events/logs) 10,000s per second Read viewing history 10,000s per second Read latest position 100,000s per second
  37. 37. gen 3 – Cluster Scale Cluster Scale Cassandra Viewing History ~100 hi1.4xl nodes ~48 TB total space used Viewing Service Stateful Tier ~1700 r3.2xl nodes 50GB heap memory per node Memcached ~450 r3.2xl/xl nodes ~8TB memory used
  38. 38. Real Time Data – gen 3 pain points • Stateful tier – Hot spots – Multi-region complexity • Monolithic service • read-modify-write poorly suited for memcached
  39. 39. Real Time Data – gen 3 learnings • Distributed stateful systems are hard – Go stateless, use C*/memcached/redis… • Decompose into microservices
  40. 40. Real Time Data – gen 4 ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Stateless Tier (fallback) Viewing History Sessions Memcached
  41. 41. Real Time Data – gen 4 Stream State/Event Collectors Stateless Microservices Data Processors Data Services Data Feeds
  42. 42. Real Time Data – gen 4 Viewing History Session State Session Positions Session Events Data Tiers redisredis
  43. 43. Session Analytics • Summarize detailed event data • Non-real time, but near real time • Some shared logic with real time
  44. 44. Session Analytics - Processing 2007 2009 20102008 2011 2012 2013 2014 Future Custom Service (Java on AWS) Mantis Batch Near Real-Time Stream Processing
  45. 45. Session Analytics - Storage 2007 2009 20102008 2011 2012 2013 2014 Future Batch Near Real-Time Stream Processing
  46. 46. Session Analytics – gen 1 • Storage • Processing SessionsLogs
  47. 47. Session Analytics – gen 1 pain points • MapReduce good for batch – Not for near real time • Complexity – Code in 2 systems / frameworks – Operational burden of 2 systems
  48. 48. Session Analytics – gen 2 • Storage • Processing Session Events & Logs Java
  49. 49. Session Analytics – gen 2 learnings • Reduced complexity – shared code and ops • Batch still available • New bottleneck – harder to extend logic
  50. 50. Session Analytics – gen 3 (*) • Storage • Processing Mantis Storm Samza Spark Streaming Stream Processing Frameworks
  51. 51. Takeaways • Polyglot Persistence – One size fits all doesn’t fit all • Strong opinions, loosely held – Design for long term, but be open to redesigns
  52. 52. Thanks! @philip_pfo

×