Building a Reliable Data Store


Published on

Video and slides synchronized, mp3 and slide download available at

Jeremy Edberg presents the data stores used by Netflix and Reddit, some of the best practices and lessons for surviving outages. Filmed at

Jeremy Edberg is currently the Reliability Architect for Netflix, the largest video streaming service in the world. Before that he ran Reddit, an online community for sharing and discussing interesting things on the internet that does more than two billion page views a month. Both run their entire operations on Amazon’s EC2. Jeremy has keynoted at conferences such as PyCon and Cloud Connect.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Building a Reliable Data Store

  1. 1. Jeremy Edberg QconSF 2012Tweet @jedberg with feedback!
  2. 2. Watch the video with slide synchronization on! /Reliable-Data-Store News & Community Site• 750,000 unique visitors/month• Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese)• Post content from our QCon conferences• News 15-20 / week• Articles 3-4 / week• Presentations (videos) 12-15 / week• Interviews 2-3 / week• Books 1 / month
  3. 3. Presented at QCon San Francisco www.qconsf.comPurpose of QCon- to empower software development by facilitating the spread ofknowledge and innovationStrategy - practitioner-driven conference designed for YOU: influencers ofchange and innovation in your teams- speakers and topics driving the evolution and innovation- connecting and catalyzing the influencers and innovatorsHighlights- attended by more than 12,000 delegates since 2007- held in 9 cities worldwide
  4. 4. Tweet @jedberg with feedback!
  5. 5. Building a Reliable Data StoreTweet @jedberg with feedback!
  6. 6. Agenda • CAP theory and how it applies to reliability • How reddit and Netflix maintain reliable data stores • Best Practices • War stories -- surviving real outagesTweet @jedberg with feedback!
  7. 7. CAP Theorem • Consistent • Available • Partition-resistantTweet @jedberg with feedback!
  8. 8. ATM ?Tweet @jedberg with feedback!
  9. 9. ATM AP Limits liability through allowing only small transactionsTweet @jedberg with feedback!
  10. 10. Flight Reservations ?Tweet @jedberg with feedback!
  11. 11. Flight Reservations AP This is why overbooking occursTweet @jedberg with feedback!
  12. 12. Tweet @jedberg with feedback!
  13. 13. The problem with CAP • Daniel Abadi had a problem with CAP • The weightings were uneven • A is essential in all scenarios • C is more important than P • Latency wasn’t accounted for at allTweet @jedberg with feedback!
  14. 14. PACELC If there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?Tweet @jedberg with feedback!
  15. 15. PartitioningTweet @jedberg with feedback!
  16. 16. Thinking like a coder Partitions are like code branchesTweet @jedberg with feedback!
  17. 17. Some examples • ACID systems (Postgres, Oracle, MySql, etc) are PC/EC • Cassandra is PA/ELTweet @jedberg with feedback!
  18. 18. Tweet @jedberg with feedback!
  19. 19. Reliability and $$Tweet @jedberg with feedback!
  20. 20. Building for redundancyTweet @jedberg with feedback!
  21. 21. We want to make sure we are building for survivalTweet @jedberg with feedback!
  22. 22. 1>2>3 Going from two to three is hardTweet @jedberg with feedback!
  23. 23. 1>2>3 Going from one to two is harderTweet @jedberg with feedback!
  24. 24. Build for Three If possible, plan for 3 or more from the beginning.Tweet @jedberg with feedback!
  25. 25. “Build for three” is the secret to successTweet @jedberg with feedback!
  26. 26. Tweet @jedberg with feedback!
  27. 27. redditTweet @jedberg with feedback!
  28. 28. ArchitectureTweet @jedberg with feedback!
  29. 29. PostgresTweet @jedberg with feedback!
  30. 30. Database Resiliancy with ShardingTweet @jedberg with feedback!
  31. 31. Sharding • reddit split writes across four master databases • Links/Accounts/Subreddits, Comments,Votes and Misc • Each has at least one slave in another zone • Avoid reading from the master if possible • Wrote their own database access layer, called the “thing” layerTweet @jedberg with feedback!
  32. 32. Sample Schema link_thing int id timestamp date int ups int downs bool deleted bool spam link_data int thing_id string name string value char kindTweet @jedberg with feedback!
  33. 33. The thing layer • Postgres is used like a key/value store • Thing table has denormalized data • Data table has arbitrary keys • Lots of indexes tuned for our specific queries • Thing and data tables are on the same box, but don’t have to beTweet @jedberg with feedback!
  34. 34. I love memcache I make heavy use of memcachedTweet @jedberg with feedback!
  35. 35. 1 A 2 3 C BTweet @jedberg with feedback!
  36. 36. 1 D A 2 3 C BTweet @jedberg with feedback!
  37. 37. CassandraTweet @jedberg with feedback!
  38. 38. Tweet @jedberg with feedback!
  39. 39. NetflixTweet @jedberg with feedback!
  40. 40. Data What does Netflix do with it all?Tweet @jedberg with feedback!
  41. 41. We store it! • Cache (memcached) • Cassandra • RDS (MySql)Tweet @jedberg with feedback!
  42. 42. I love memcache I make heavy use of memcachedTweet @jedberg with feedback!
  43. 43. RDS (Relational Database Service)Tweet @jedberg with feedback!
  44. 44. CassandraTweet @jedberg with feedback!
  45. 45. A/B TestingTweet @jedberg with feedback!
  46. 46. A/B Testing Online Data Offline Data Test Cell allocation Test tracking Test Metadata Retention Start/End date Fraction Viewed UI Directives Pages ViewedTweet @jedberg with feedback!
  47. 47. AtlasTweet @jedberg with feedback!
  48. 48. AWS Usage Dollar amounts have been carefully removedTweet @jedberg with feedback!
  49. 49. ChronosTweet @jedberg with feedback!
  50. 50. More Things Netflix Stores in Cassandra • Video Quality • Network issues • Usage History • Playback ErrorsTweet @jedberg with feedback!
  51. 51. Service based architectureTweet @jedberg with feedback!
  52. 52. Netflix on AWS 2012 2012 2012 IPv6 IPv6 IPv6Tweet @jedberg with feedback!
  53. 53. Abstraction • Data sources are abstracted away behind restful interfaces • Each application owns its own consistency • Each application can scale independently based on loadTweet @jedberg with feedback!
  54. 54. Netflix autoscaling2 Text1 Traffic Peak Tweet @jedberg with feedback!
  55. 55. The Big Oracle DatabaseTweet @jedberg with feedback!
  56. 56. Circuit BreakersBe liberal in what you accept, strict in what you send Tweet @jedberg with feedback!
  57. 57. CassandraTweet @jedberg with feedback!
  58. 58. PriamTweet @jedberg with feedback!
  59. 59. Cassandra ArchitectureTweet @jedberg with feedback!
  60. 60. Cassandra ArchitectureTweet @jedberg with feedback!
  61. 61. How it works • Replication factor • Quorum reads / writes • Bloom Filter for fast negative lookups • Immutable files for fast writes • Seed nodes • Multi-region • Gossip protocolTweet @jedberg with feedback!
  62. 62. Cassandra Benefits • Fast writes • Fast negative lookups • Easy incremental scalability • Distributed -- No SPoFTweet @jedberg with feedback!
  63. 63. Why Cassandra? • Availability over consistency • Writes over reads • We know Java • Open source + supportTweet @jedberg with feedback!
  64. 64. Tweet @jedberg with feedback!
  65. 65. We live in an unreliable worldTweet @jedberg with feedback!
  66. 66. Tweet @jedberg with feedback!
  67. 67. Tweet @jedberg with feedback!
  68. 68. Tweet @jedberg with feedback!
  69. 69. Tips, and Tricks
  70. 70. Queues are your friend • Votes • Comments • Thumbnail scraper • Precomputed queries • Spam • processing • correctionsTweet @jedberg with feedback!
  71. 71. Caching is a good way to hide your failuresTweet @jedberg with feedback!
  72. 72. Sometimes users notice your data inconstancyTweet @jedberg with feedback!
  73. 73. EVCache 1 D A2 3 + C B Tweet @jedberg with feedback!
  74. 74. Do you even need a cache?Tweet @jedberg with feedback!
  75. 75. Think of SSDs as cheap RAM, not expensive diskTweet @jedberg with feedback!
  76. 76. Going multi-zone or multi-datacenterTweet @jedberg with feedback!
  77. 77. Benefits of Amazon’s Zones • Loosely connected • Low latency between zones • 99.95% uptime guarantee per zoneTweet @jedberg with feedback!
  78. 78. Going Multi-regionTweet @jedberg with feedback!
  79. 79. Leveraging Mutli-region • 100% uptime is theoretically possible. • You have to replicate your data • This will cost moneyTweet @jedberg with feedback!
  80. 80. Other options • Backup datacenter • Backup providerTweet @jedberg with feedback!
  81. 81. Cause chaosTweet @jedberg with feedback!
  82. 82. The Monkey Theory • Simulate things that go wrong • Find things that are differentTweet @jedberg with feedback!
  83. 83. The simian army • Chaos -- Kills random instances • Latency -- Slows the network down • Conformity -- Looks for outliers • Doctor -- Looks for passing health checks • Janitor -- Cleans up unused resources • Howler -- Yells about bad thingsTweet @jedberg with feedback!
  84. 84. The Chaos GorillaTweet @jedberg with feedback!
  85. 85. Automate all the things!Tweet @jedberg with feedback!
  86. 86. Automate all the things! • Application startup • Configuration • Code deployment • System deploymentTweet @jedberg with feedback!
  87. 87. Incident Reviews Ask the key questions: • What went wrong? • How could we have detected it sooner? • How could we have prevented it? • How can we prevent this class of problem in the future? • How can we improve our behavior for next time?Tweet @jedberg with feedback!
  88. 88. The Netflix way • Everything is “built for three” • Fully automated build tools to test and make packages • Fully automated machine image bakery • Fully automated image deploymentTweet @jedberg with feedback!
  89. 89. All systems choices assume some part will fail at some point.Tweet @jedberg with feedback!
  90. 90. Best Practices• Keep data in multiple Availability Zones / DCs• Avoid keeping state on a single instanceTweet @jedberg with feedback!
  91. 91. Best Practices • Isolated Services • Three Balanced AZs • Triple replicated persistence • Isolated RegionsTweet @jedberg with feedback!
  92. 92. Best Practices • Don’t trust your dependencies • Have good fallbacks • Use circuit breakers/dependency commandsTweet @jedberg with feedback!
  93. 93. Best Practices • Be generous in what you accept and stingy in what you giveTweet @jedberg with feedback!
  94. 94. Best Practices • Hope for the best, assume the worstTweet @jedberg with feedback!
  95. 95. Tweet @jedberg with feedback!
  96. 96. War StoriesTweet @jedberg with feedback!
  97. 97. April 2011 EBS outageTweet @jedberg with feedback!
  98. 98. June 29th Outage • Due to a severe storm, power went out in one AZ • Netflix did not do well because of a bug in our internal mid-tier load balancer • However, Cassandra held up just fine!Tweet @jedberg with feedback!
  99. 99. October 29th Outage • EBS degradation in one Zone • We did much better this time • Cassandra just kept running • MySql not as well, but fallbacks kicked inTweet @jedberg with feedback!
  100. 100. Hurricane Sandy The outage that never wasTweet @jedberg with feedback!
  101. 101. Just a quick reminder... (Some of) Netflix is open source: @jedberg with feedback!
  102. 102. Another reminder... reddit is also open source patches are now being accepted!Tweet @jedberg with feedback!
  103. 103. Netflix is hiring - or - email and tell them jedberg sent youTweet @jedberg with feedback!
  104. 104. Questions?Tweet @jedberg with feedback!
  105. 105. Tweet @jedberg with feedback!
  106. 106. Getting in touch Email: Twitter: @jedberg Web: Facebook: Linkedin: reddit: @jedberg with feedback!