Your SlideShare is downloading. ×
0
SPOT 401 - Leading the NoSQL
Revolution:
under the covers of Distributed
Systems @ scale
@swami_79

@ksshams
what are we covering?
The evolution of large scale
distributed systems @ Amazon from
the 90’s to today

The lessons we
lea...
let’s start with a story about a little
company called amazon.com

@swami_79

@ksshams
episode 1
once upon a time...
(in 2000)

@swami_79

@ksshams
a thousand miles
away...
(seattle)

@swami_79

@ksshams
amazon.com - a rapidly growing Internet based retail
business relied on relational databases

@swami_79

@ksshams
we had 1000s of independent services

@swami_79

@ksshams
each service managed its own state in RDBMs

@swami_79

@ksshams
RDBMs are actually kind of cool

@swami_79

@ksshams
first of all... SQL!!

@swami_79

@ksshams
so it is easier to query..

@swami_79

@ksshams
easier to learn

@swami_79

@ksshams
they are as versatile as a swiss army knife

complex
queries

key-value
access

analytics

transactions
@swami_79

@kssham...
RDBMs are *very* similar to
Swiss Army Knives

@swami_79

@ksshams
but sometimes.. swiss army knifes..
can be more than what you bargained for

@swami_79

@ksshams
repartitioning
HARD..

partitioning

easy

@swami_79

@ksshams
so we bought

bigger boxes...
@swami_79

@ksshams
benchmark
new
hardware

migrate to
new
hardware

Q4 was hard-work at Amazon
repartition
databases

pray
...
@swami_79

@ks...
RDBMs availability challenges..

@swami_79

@ksshams
episode 2
then.. (in 2005)

@swami_79

@ksshams
amazon dynamo
predecessor to
dynamoDB
replicated DHT with consistent
hashing
optimistic replication
“sloppy quorum”
anti-e...
dynamo had many benefits
• higher availability
• we traded it off for eventual consistency

•
•
•
•

incremental scalabili...
but dynamo was not perfect...
lacked strong consistency

@swami_79

@ksshams
but dynamo was not perfect...
scaling was easier, but...

@swami_79

@ksshams
but dynamo was not perfect...
steep learning curve

@swami_79

@ksshams
but dynamo was not perfect...
dynamo was a product ... ==>> not
a service...

@swami_79

@ksshams
episode 3
then.. (in 2012)

@swami_79

@ksshams
DynamoDB
• NoSQL database
• fast & predictable

performance
• seamless scalability
• easy administration

ADMIN

“Even tho...
services, services, services

@swami_79

@ksshams
amazon.com’s experience with services

@swami_79

@ksshams
how do you create a successful service?

@swami_79

@ksshams
with great services, comes great responsibility

@swami_79

@ksshams
Architect

Customer

@swami_79

@ksshams
DynamoDB Goals and
Philosophies
never compromise on
scale is our
durability
problem
easy to use
consistent and low
scale i...
how to build these large scale services?

@swami_79

@ksshams
don’t compromise on durability…

@swami_79

@ksshams
don’t compromise on… availability

@swami_79

@ksshams
plan for success, plan for scalability

@swami_79

@ksshams
@swami_79

@ksshams
Fault tolerant design
is key..
• Everything fails all the time
• Planning for failures is not easy
• How do you ensure you...
Byzantine General Problem
@swami_79

@ksshams
A simple 2-way replication system of a
traditional database…
Writes

Primary

Standby

@swami_79

@ksshams
P is dead, need to
promote myself

S is dead, need
to trigger new
replica

P

P’
S

@swami_79

@ksshams
Improved
Replication: Quorum
Replica

Replica

Writes

Replica

Quorum: Successful write on a majority

@swami_79

@kssham...
Not so easy..
New member in the
group

Replica D

Replica A

Replica B

Reads and
Writes from
client B

Replica C

Should ...
Building correct distributed systems is
not straight forward..
• How do you handle replica failures?
• How do you ensure t...
correctness is hard, but necessary
Formal Methods
Formal Methods
to minimize bugs, we must have a precise description
of the design
Formal Methods
code is too detailed

design documents and diagrams are vague &
imprecise
how would you express
partial fai...
Formal Methods
law of large numbers is your friend,

until you hit large
numbers
so design for scale
TLA+ to the rescue?

@swami_79

@ksshams
PlusCal

@swami_79

@ksshams
formal methods are necessary

but not sufficient..
@swami_79

@ksshams
customer

@swami_79

@ksshams
don’t forget to test - no, serious ly

@swami_79

@ksshams
simulate
failures at unit
test level

fault injection
testing
scale testing

embrace failure and don’t be
surprised
datace...
testing is a lifelong journey
testing is necessary
but not sufficient..
@swami_79

@ksshams
Customer

Architect

@swami_79

@ksshams
gamma
simulate real
world

one box
does it work?

release cycle
phased
deployment
treading lightly

monitor
does it still
...
Monitor customer behavior

@swami_79

@ksshams
measuring customer experience is key

don’t be satisfied by average - look at
99 percentile
@swami_79

@ksshams
understand the scaling dimensions

@swami_79

@ksshams
understand how your service will be abused
@swami_79

@ksshams
let’s see these rules in action through a true story

@swami_79

@ksshams
we were building distributed systems all over
amazon.com

@swami_79

@ksshams
we needed a uniform and correct way to do
consensus..

@swami_79

@ksshams
service
so we built a paxos lock library
@swami_79

@ksshams
such a service is so much more useful than just
leader election..
it became a distributed
state store

@swami_79

@ksshams
such a service is so much more useful than just
leader election..
or a distributed state
store
wait wait.. you’re telling ...
we acted quickly - and scaled up our entire fleet
with more nodes

doh!!!!

we slowed
consensus...
@swami_79

@ksshams
understand the scaling dimensions

& scale them
independently...

@swami_79

@ksshams
a lock service has 3 components..

State Store

@swami_79

@ksshams
they must be scaled independently..

State Store
@swami_79

@ksshams
they must be scaled independently..

State Store
@swami_79

@ksshams
they must be scaled independently..

State Store
@swami_79

@ksshams
Let’s Go Over The demo from this morning
stream ingestion
stream ingestion
stream ingestion
Real-time tweet analytics using DynamoDB
• Stream from Kinesis to DynamoDB

• What data do want in real-time?
• (per-secon...
WordCount Table

Local Secondary Index

Time

Word

Count

Time

Count

Word

2013-10-13T12:00
2013-10-13T12:00
2013-10-13...
DynamoDB cost: $0.25 / hr
stream ingestion
Aggregate queries using Redshift
• Simple Redshift connector (buffer files, store in s3, call copy

command)
• Manifest co...
Right tool for right job…
• Canal -> DynamoDB -> Redshift -> Glacier…
You are not done yet..
• Listen to customer feedback
• Iterate..
Example: DynamoDB
• Start with immediate needs of reliable, super scalable, low latency

datastore
• Iterate
• Developers ...
Sacred Tenets in
Distributed Systems
don’t compromise durability
for performance

plan for success –
plan for scalability
...
understand
scaling
dimensions

observe
how service is
used
monitor
like a hawk

relentlessly
test
scalability
over feature...
Please give us your feedback on this
presentation

SPOT 401

Don’t miss SPOT 201!!!

@swami_79

@ksshams
@swami_79

@ksshams
Upcoming SlideShare
Loading in...5
×

NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013

2,696

Published on

The Dynamo paper started a revolution in distributed systems. The contributions from this paper are still impacting the design and practices of some of the world's largest distributed systems, including those at Amazon.com and beyond. Building distributed systems is hard, but our goal in this session is to simplify the complexity of this topic to empower the hacker in you! Have you been bitten by the eventual consistency bug lately? We show you how to tame eventual consistency and make it a great scaling asset. As you scale up, you must be ready to deal with node, rack, and data center failure. We share insights on how to limit the blast radius of the individual components of your system, battle tested techniques for simulating failures (network partitions, data center failure), and how we used core distributed systems fundamentals to build highly scalable, performance, durable, and resilient systems. Come watch us uncover the secret sauce behind Amazon DynamoDB, Amazon SQS, Amazon SNS, and the fundamental tenents that define them as Internet scale services. To turn this session into a hacker's dream, we go over design and implementation practices you can follow to build an application with virtually limitless scalability on AWS within an hour. We even share insights and secret tips on how to make the most out of one of the services released during the morning keynote.

Published in: Technology

Transcript of "NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013"

  1. 1. SPOT 401 - Leading the NoSQL Revolution: under the covers of Distributed Systems @ scale @swami_79 @ksshams
  2. 2. what are we covering? The evolution of large scale distributed systems @ Amazon from the 90’s to today The lessons we learned and insights you can employ in your own distributed systems @swami_79 @ksshams
  3. 3. let’s start with a story about a little company called amazon.com @swami_79 @ksshams
  4. 4. episode 1 once upon a time... (in 2000) @swami_79 @ksshams
  5. 5. a thousand miles away... (seattle) @swami_79 @ksshams
  6. 6. amazon.com - a rapidly growing Internet based retail business relied on relational databases @swami_79 @ksshams
  7. 7. we had 1000s of independent services @swami_79 @ksshams
  8. 8. each service managed its own state in RDBMs @swami_79 @ksshams
  9. 9. RDBMs are actually kind of cool @swami_79 @ksshams
  10. 10. first of all... SQL!! @swami_79 @ksshams
  11. 11. so it is easier to query.. @swami_79 @ksshams
  12. 12. easier to learn @swami_79 @ksshams
  13. 13. they are as versatile as a swiss army knife complex queries key-value access analytics transactions @swami_79 @ksshams
  14. 14. RDBMs are *very* similar to Swiss Army Knives @swami_79 @ksshams
  15. 15. but sometimes.. swiss army knifes.. can be more than what you bargained for @swami_79 @ksshams
  16. 16. repartitioning HARD.. partitioning easy @swami_79 @ksshams
  17. 17. so we bought bigger boxes... @swami_79 @ksshams
  18. 18. benchmark new hardware migrate to new hardware Q4 was hard-work at Amazon repartition databases pray ... @swami_79 @ksshams
  19. 19. RDBMs availability challenges.. @swami_79 @ksshams
  20. 20. episode 2 then.. (in 2005) @swami_79 @ksshams
  21. 21. amazon dynamo predecessor to dynamoDB replicated DHT with consistent hashing optimistic replication “sloppy quorum” anti-entropy mechanism object versioning specialist tool : •limited querying capabilities •simpler consistency @swami_79 @ksshams
  22. 22. dynamo had many benefits • higher availability • we traded it off for eventual consistency • • • • incremental scalability no more repartitioning no need to architect apps for peak just add boxes • simpler querying model ==>> predictable performance @swami_79 @ksshams
  23. 23. but dynamo was not perfect... lacked strong consistency @swami_79 @ksshams
  24. 24. but dynamo was not perfect... scaling was easier, but... @swami_79 @ksshams
  25. 25. but dynamo was not perfect... steep learning curve @swami_79 @ksshams
  26. 26. but dynamo was not perfect... dynamo was a product ... ==>> not a service... @swami_79 @ksshams
  27. 27. episode 3 then.. (in 2012) @swami_79 @ksshams
  28. 28. DynamoDB • NoSQL database • fast & predictable performance • seamless scalability • easy administration ADMIN “Even though we have years of experience with large, complex NoSQL architectures, we are happy to be finally out of the business of managing it ourselves.” - Don MacAskill, CEO @swami_79 @ksshams
  29. 29. services, services, services @swami_79 @ksshams
  30. 30. amazon.com’s experience with services @swami_79 @ksshams
  31. 31. how do you create a successful service? @swami_79 @ksshams
  32. 32. with great services, comes great responsibility @swami_79 @ksshams
  33. 33. Architect Customer @swami_79 @ksshams
  34. 34. DynamoDB Goals and Philosophies never compromise on scale is our durability problem easy to use consistent and low scale in rps latencies @swami_79 @ksshams
  35. 35. how to build these large scale services? @swami_79 @ksshams
  36. 36. don’t compromise on durability… @swami_79 @ksshams
  37. 37. don’t compromise on… availability @swami_79 @ksshams
  38. 38. plan for success, plan for scalability @swami_79 @ksshams
  39. 39. @swami_79 @ksshams
  40. 40. Fault tolerant design is key.. • Everything fails all the time • Planning for failures is not easy • How do you ensure your recovery strategies work correctly? @swami_79 @ksshams
  41. 41. Byzantine General Problem @swami_79 @ksshams
  42. 42. A simple 2-way replication system of a traditional database… Writes Primary Standby @swami_79 @ksshams
  43. 43. P is dead, need to promote myself S is dead, need to trigger new replica P P’ S @swami_79 @ksshams
  44. 44. Improved Replication: Quorum Replica Replica Writes Replica Quorum: Successful write on a majority @swami_79 @ksshams
  45. 45. Not so easy.. New member in the group Replica D Replica A Replica B Reads and Writes from client B Replica C Should I continue to serve reads? Should I start a new quorum? Replica E Writes from client A Replica F Classic Split Brain Issue in Replicated systems leading to lost writes!
  46. 46. Building correct distributed systems is not straight forward.. • How do you handle replica failures? • How do you ensure there is not a parallel quorum? • How do you handle partial failures of replicas? • How do you handle concurrent failures? @swami_79 @ksshams
  47. 47. correctness is hard, but necessary
  48. 48. Formal Methods
  49. 49. Formal Methods to minimize bugs, we must have a precise description of the design
  50. 50. Formal Methods code is too detailed design documents and diagrams are vague & imprecise how would you express partial failures or concurrency?
  51. 51. Formal Methods law of large numbers is your friend, until you hit large numbers so design for scale
  52. 52. TLA+ to the rescue? @swami_79 @ksshams
  53. 53. PlusCal @swami_79 @ksshams
  54. 54. formal methods are necessary but not sufficient.. @swami_79 @ksshams
  55. 55. customer @swami_79 @ksshams
  56. 56. don’t forget to test - no, serious ly @swami_79 @ksshams
  57. 57. simulate failures at unit test level fault injection testing scale testing embrace failure and don’t be surprised datacenter testing network brown out testing
  58. 58. testing is a lifelong journey
  59. 59. testing is necessary but not sufficient.. @swami_79 @ksshams
  60. 60. Customer Architect @swami_79 @ksshams
  61. 61. gamma simulate real world one box does it work? release cycle phased deployment treading lightly monitor does it still work? @swami_79 @ksshams
  62. 62. Monitor customer behavior @swami_79 @ksshams
  63. 63. measuring customer experience is key don’t be satisfied by average - look at 99 percentile @swami_79 @ksshams
  64. 64. understand the scaling dimensions @swami_79 @ksshams
  65. 65. understand how your service will be abused @swami_79 @ksshams
  66. 66. let’s see these rules in action through a true story @swami_79 @ksshams
  67. 67. we were building distributed systems all over amazon.com @swami_79 @ksshams
  68. 68. we needed a uniform and correct way to do consensus.. @swami_79 @ksshams
  69. 69. service so we built a paxos lock library @swami_79 @ksshams
  70. 70. such a service is so much more useful than just leader election.. it became a distributed state store @swami_79 @ksshams
  71. 71. such a service is so much more useful than just leader election.. or a distributed state store wait wait.. you’re telling me if I poll, I can detect node failure? @swami_79 @ksshams
  72. 72. we acted quickly - and scaled up our entire fleet with more nodes doh!!!! we slowed consensus... @swami_79 @ksshams
  73. 73. understand the scaling dimensions & scale them independently... @swami_79 @ksshams
  74. 74. a lock service has 3 components.. State Store @swami_79 @ksshams
  75. 75. they must be scaled independently.. State Store @swami_79 @ksshams
  76. 76. they must be scaled independently.. State Store @swami_79 @ksshams
  77. 77. they must be scaled independently.. State Store @swami_79 @ksshams
  78. 78. Let’s Go Over The demo from this morning
  79. 79. stream ingestion
  80. 80. stream ingestion
  81. 81. stream ingestion
  82. 82. Real-time tweet analytics using DynamoDB • Stream from Kinesis to DynamoDB • What data do want in real-time? • (per-second, top words) • How does DynamoDB help? • Atomic counters (per-word counts in that second) • Indexed queries (top N word-counts in that second
  83. 83. WordCount Table Local Secondary Index Time Word Count Time Count Word 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:03 Earth Mars Pluto Earth 9 10 5 8 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:00 2013-10-13T12:03 5 9 10 8 Pluto Earth Mars Earth
  84. 84. DynamoDB cost: $0.25 / hr
  85. 85. stream ingestion
  86. 86. Aggregate queries using Redshift • Simple Redshift connector (buffer files, store in s3, call copy command) • Manifest copy connector • 2 streams • transaction table for deduplication • manifest copy
  87. 87. Right tool for right job… • Canal -> DynamoDB -> Redshift -> Glacier…
  88. 88. You are not done yet.. • Listen to customer feedback • Iterate..
  89. 89. Example: DynamoDB • Start with immediate needs of reliable, super scalable, low latency datastore • Iterate • Developers wanted flexible query: Local Secondary Indexes • Developers wanted parallel loads: Parallel Scans • Mobile developers wanted direct access to their datastore: Fine-grained Access Control • Mobile developers wanted geo-awareness: Geospatial library • Developers wanted DynamoDB on their laptop: DynamoDB Local • Developers wanted richer query: Global Secondary Indexes • We will continue to innovate..
  90. 90. Sacred Tenets in Distributed Systems don’t compromise durability for performance plan for success – plan for scalability plan for failures - fault tolerance is key consistent performance is important release - think of blast radius insist on correctness @swami_79 @ksshams
  91. 91. understand scaling dimensions observe how service is used monitor like a hawk relentlessly test scalability over features @swami_79 strive for correctness @ksshams
  92. 92. Please give us your feedback on this presentation SPOT 401 Don’t miss SPOT 201!!! @swami_79 @ksshams
  93. 93. @swami_79 @ksshams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×