NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013


Published on

The Dynamo paper started a revolution in distributed systems. The contributions from this paper are still impacting the design and practices of some of the world's largest distributed systems, including those at and beyond. Building distributed systems is hard, but our goal in this session is to simplify the complexity of this topic to empower the hacker in you! Have you been bitten by the eventual consistency bug lately? We show you how to tame eventual consistency and make it a great scaling asset. As you scale up, you must be ready to deal with node, rack, and data center failure. We share insights on how to limit the blast radius of the individual components of your system, battle tested techniques for simulating failures (network partitions, data center failure), and how we used core distributed systems fundamentals to build highly scalable, performance, durable, and resilient systems. Come watch us uncover the secret sauce behind Amazon DynamoDB, Amazon SQS, Amazon SNS, and the fundamental tenents that define them as Internet scale services. To turn this session into a hacker's dream, we go over design and implementation practices you can follow to build an application with virtually limitless scalability on AWS within an hour. We even share insights and secret tips on how to make the most out of one of the services released during the morning keynote.

Published in: Technology

NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013

  1. 1. SPOT 401 - Leading the NoSQL Revolution: under the covers of Distributed Systems @ scale @swami_79 @ksshams
  2. 2. what are we covering? The evolution of large scale distributed systems @ Amazon from the 90’s to today The lessons we learned and insights you can employ in your own distributed systems @swami_79 @ksshams
  3. 3. let’s start with a story about a little company called @swami_79 @ksshams
  4. 4. episode 1 once upon a time... (in 2000) @swami_79 @ksshams
  5. 5. a thousand miles away... (seattle) @swami_79 @ksshams
  6. 6. - a rapidly growing Internet based retail business relied on relational databases @swami_79 @ksshams
  7. 7. we had 1000s of independent services @swami_79 @ksshams
  8. 8. each service managed its own state in RDBMs @swami_79 @ksshams
  9. 9. RDBMs are actually kind of cool @swami_79 @ksshams
  10. 10. first of all... SQL!! @swami_79 @ksshams
  11. 11. so it is easier to query.. @swami_79 @ksshams
  12. 12. easier to learn @swami_79 @ksshams
  13. 13. they are as versatile as a swiss army knife complex queries key-value access analytics transactions @swami_79 @ksshams
  14. 14. RDBMs are *very* similar to Swiss Army Knives @swami_79 @ksshams
  15. 15. but sometimes.. swiss army knifes.. can be more than what you bargained for @swami_79 @ksshams
  16. 16. repartitioning HARD.. partitioning easy @swami_79 @ksshams
  17. 17. so we bought bigger boxes... @swami_79 @ksshams
  18. 18. benchmark new hardware migrate to new hardware Q4 was hard-work at Amazon repartition databases pray ... @swami_79 @ksshams
  19. 19. RDBMs availability challenges.. @swami_79 @ksshams
  20. 20. episode 2 then.. (in 2005) @swami_79 @ksshams
  21. 21. amazon dynamo predecessor to dynamoDB replicated DHT with consistent hashing optimistic replication “sloppy quorum” anti-entropy mechanism object versioning specialist tool : •limited querying capabilities •simpler consistency @swami_79 @ksshams
  22. 22. dynamo had many benefits • higher availability • we traded it off for eventual consistency • • • • incremental scalability no more repartitioning no need to architect apps for peak just add boxes • simpler querying model ==>> predictable performance @swami_79 @ksshams
  23. 23. but dynamo was not perfect... lacked strong consistency @swami_79 @ksshams
  24. 24. but dynamo was not perfect... scaling was easier, but... @swami_79 @ksshams
  25. 25. but dynamo was not perfect... steep learning curve @swami_79 @ksshams
  26. 26. but dynamo was not perfect... dynamo was a product ... ==>> not a service... @swami_79 @ksshams
  27. 27. episode 3 then.. (in 2012) @swami_79 @ksshams
  28. 28. DynamoDB • NoSQL database • fast & predictable performance • seamless scalability • easy administration ADMIN “Even though we have years of experience with large, complex NoSQL architectures, we are happy to be finally out of the business of managing it ourselves.” - Don MacAskill, CEO @swami_79 @ksshams
  29. 29. services, services, services @swami_79 @ksshams
  30. 30.’s experience with services @swami_79 @ksshams
  31. 31. how do you create a successful service? @swami_79 @ksshams
  32. 32. with great services, comes great responsibility @swami_79 @ksshams
  33. 33. Architect Customer @swami_79 @ksshams
  34. 34. DynamoDB Goals and Philosophies never compromise on scale is our durability problem easy to use consistent and low scale in rps latencies @swami_79 @ksshams
  35. 35. how to build these large scale services? @swami_79 @ksshams
  36. 36. don’t compromise on durability… @swami_79 @ksshams
  37. 37. don’t compromise on… availability @swami_79 @ksshams
  38. 38. plan for success, plan for scalability @swami_79 @ksshams
  39. 39. @swami_79 @ksshams
  40. 40. Fault tolerant design is key.. • Everything fails all the time • Planning for failures is not easy • How do you ensure your recovery strategies work correctly? @swami_79 @ksshams
  41. 41. Byzantine General Problem @swami_79 @ksshams
  42. 42. A simple 2-way replication system of a traditional database… Writes Primary Standby @swami_79 @ksshams
  43. 43. P is dead, need to promote myself S is dead, need to trigger new replica P P’ S @swami_79 @ksshams
  44. 44. Improved Replication: Quorum Replica Replica Writes Replica Quorum: Successful write on a majority @swami_79 @ksshams
  45. 45. Not so easy.. New member in the group Replica D Replica A Replica B Reads and Writes from client B Replica C Should I continue to serve reads? Should I start a new quorum? Replica E Writes from client A Replica F Classic Split Brain Issue in Replicated systems leading to lost writes!
  46. 46. Building correct distributed systems is not straight forward.. • How do you handle replica failures? • How do you ensure there is not a parallel quorum? • How do you handle partial failures of replicas? • How do you handle concurrent failures? @swami_79 @ksshams
  47. 47. correctness is hard, but necessary
  48. 48. Formal Methods
  49. 49. Formal Methods to minimize bugs, we must have a precise description of the design
  50. 50. Formal Methods code is too detailed design documents and diagrams are vague & imprecise how would you express partial failures or concurrency?
  51. 51. Formal Methods law of large numbers is your friend, until you hit large numbers so design for scale
  52. 52. TLA+ to the rescue? @swami_79 @ksshams
  53. 53. PlusCal @swami_79 @ksshams
  54. 54. formal methods are necessary but not sufficient.. @swami_79 @ksshams
  55. 55. customer @swami_79 @ksshams
  56. 56. don’t forget to test - no, serious ly @swami_79 @ksshams
  57. 57. simulate failures at unit test level fault injection testing scale testing embrace failure and don’t be surprised datacenter testing network brown out testing
  58. 58. testing is a lifelong journey
  59. 59. testing is necessary but not sufficient.. @swami_79 @ksshams
  60. 60. Customer Architect @swami_79 @ksshams
  61. 61. gamma simulate real world one box does it work? release cycle phased deployment treading lightly monitor does it still work? @swami_79 @ksshams
  62. 62. Monitor customer behavior @swami_79 @ksshams
  63. 63. measuring customer experience is key don’t be satisfied by average - look at 99 percentile @swami_79 @ksshams
  64. 64. understand the scaling dimensions @swami_79 @ksshams
  65. 65. understand how your service will be abused @swami_79 @ksshams
  66. 66. let’s see these rules in action through a true story @swami_79 @ksshams
  67. 67. we were building distributed systems all over @swami_79 @ksshams
  68. 68. we needed a uniform and correct way to do consensus.. @swami_79 @ksshams
  69. 69. service so we built a paxos lock library @swami_79 @ksshams
  70. 70. such a service is so much more useful than just leader election.. it became a distributed state store @swami_79 @ksshams
  71. 71. such a service is so much more useful than just leader election.. or a distributed state store wait wait.. you’re telling me if I poll, I can detect node failure? @swami_79 @ksshams
  72. 72. we acted quickly - and scaled up our entire fleet with more nodes doh!!!! we slowed consensus... @swami_79 @ksshams
  73. 73. understand the scaling dimensions & scale them independently... @swami_79 @ksshams
  74. 74. a lock service has 3 components.. State Store @swami_79 @ksshams
  75. 75. they must be scaled independently.. State Store @swami_79 @ksshams
  76. 76. they must be scaled independently.. State Store @swami_79 @ksshams
  77. 77. they must be scaled independently.. State Store @swami_79 @ksshams
  78. 78. Let’s Go Over The demo from this morning
  79. 79. stream ingestion
  80. 80. stream ingestion
  81. 81. stream ingestion
  82. 82. Real-time tweet analytics using DynamoDB • Stream from Kinesis to DynamoDB • What data do want in real-time? • (per-second, top words) • How does DynamoDB help? • Atomic counters (per-word counts in that second) • Indexed queries (top N word-counts in that second
  83. 83. WordCount Table Local Secondary Index Time Word Count Time Count Word 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:03 Earth Mars Pluto Earth 9 10 5 8 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:03 5 9 10 8 Pluto Earth Mars Earth
  84. 84. DynamoDB cost: $0.25 / hr
  85. 85. stream ingestion
  86. 86. Aggregate queries using Redshift • Simple Redshift connector (buffer files, store in s3, call copy command) • Manifest copy connector • 2 streams • transaction table for deduplication • manifest copy
  87. 87. Right tool for right job… • Canal -> DynamoDB -> Redshift -> Glacier…
  88. 88. You are not done yet.. • Listen to customer feedback • Iterate..
  89. 89. Example: DynamoDB • Start with immediate needs of reliable, super scalable, low latency datastore • Iterate • Developers wanted flexible query: Local Secondary Indexes • Developers wanted parallel loads: Parallel Scans • Mobile developers wanted direct access to their datastore: Fine-grained Access Control • Mobile developers wanted geo-awareness: Geospatial library • Developers wanted DynamoDB on their laptop: DynamoDB Local • Developers wanted richer query: Global Secondary Indexes • We will continue to innovate..
  90. 90. Sacred Tenets in Distributed Systems don’t compromise durability for performance plan for success – plan for scalability plan for failures - fault tolerance is key consistent performance is important release - think of blast radius insist on correctness @swami_79 @ksshams
  91. 91. understand scaling dimensions observe how service is used monitor like a hawk relentlessly test scalability over features @swami_79 strive for correctness @ksshams
  92. 92. Please give us your feedback on this presentation SPOT 401 Don’t miss SPOT 201!!! @swami_79 @ksshams
  93. 93. @swami_79 @ksshams