NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013

SPOT 401 - Leading the NoSQL
Revolution:
under the covers of Distributed
Systems @ scale
@swami_79

@ksshams

what are we covering?
The evolution of large scale
distributed systems @ Amazon from
the 90’s to today

The lessons we
learned and insights
you can employ in
your own distributed
systems
@swami_79

@ksshams

let’s start with a story about a little
company called amazon.com

@swami_79

@ksshams

episode 1
once upon a time...
(in 2000)

@swami_79

@ksshams

a thousand miles
away...
(seattle)

@swami_79

@ksshams

amazon.com - a rapidly growing Internet based retail
business relied on relational databases

@swami_79

@ksshams

we had 1000s of independent services

@swami_79

@ksshams

each service managed its own state in RDBMs

@swami_79

@ksshams

RDBMs are actually kind of cool

@swami_79

@ksshams

first of all... SQL!!

@swami_79

@ksshams

so it is easier to query..

@swami_79

@ksshams

easier to learn

@swami_79

@ksshams

they are as versatile as a swiss army knife

complex
queries

key-value
access

analytics

transactions
@swami_79

@ksshams

RDBMs are *very* similar to
Swiss Army Knives

@swami_79

@ksshams

but sometimes.. swiss army knifes..
can be more than what you bargained for

@swami_79

@ksshams

repartitioning
HARD..

partitioning

easy

@swami_79

@ksshams

so we bought

bigger boxes...
@swami_79

@ksshams

benchmark
new
hardware

migrate to
new
hardware

Q4 was hard-work at Amazon
repartition
databases

pray
...
@swami_79

@ksshams

RDBMs availability challenges..

@swami_79

@ksshams

episode 2
then.. (in 2005)

@swami_79

@ksshams

amazon dynamo
predecessor to
dynamoDB
replicated DHT with consistent
hashing
optimistic replication
“sloppy quorum”
anti-entropy mechanism
object versioning
specialist tool :
•limited querying capabilities
•simpler consistency
@swami_79

@ksshams

dynamo had many benefits
• higher availability
• we traded it off for eventual consistency

•
•
•
•

incremental scalability
no more repartitioning
no need to architect apps for peak
just add boxes

• simpler querying model ==>> predictable performance
@swami_79

@ksshams

but dynamo was not perfect...
lacked strong consistency

@swami_79

@ksshams

scaling was easier, but...

@swami_79

@ksshams

steep learning curve

@swami_79

@ksshams

dynamo was a product ... ==>> not
a service...

@swami_79

@ksshams

episode 3
then.. (in 2012)

@swami_79

@ksshams

DynamoDB
• NoSQL database
• fast & predictable

performance
• seamless scalability
• easy administration

ADMIN

“Even though we have years of experience with large, complex
NoSQL architectures, we are happy to be finally out of the
business of managing it ourselves.” - Don MacAskill, CEO
@swami_79

@ksshams

services, services, services

@swami_79

@ksshams

amazon.com’s experience with services

@swami_79

@ksshams

how do you create a successful service?

@swami_79

@ksshams

with great services, comes great responsibility

@swami_79

@ksshams

Architect

Customer

@swami_79

@ksshams

DynamoDB Goals and
Philosophies
never compromise on
scale is our
durability
problem
easy to use
consistent and low
scale in rps
latencies
@swami_79

@ksshams

how to build these large scale services?

@swami_79

@ksshams

don’t compromise on durability…

@swami_79

@ksshams

don’t compromise on… availability

@swami_79

@ksshams

plan for success, plan for scalability

@swami_79

@ksshams

Fault tolerant design
is key..
• Everything fails all the time
• Planning for failures is not easy
• How do you ensure your recovery strategies work correctly?

@swami_79

@ksshams

Byzantine General Problem
@swami_79

@ksshams

A simple 2-way replication system of a
traditional database…
Writes

Primary

Standby

@swami_79

@ksshams

P is dead, need to
promote myself

S is dead, need
to trigger new
replica

P

P’
S

@swami_79

@ksshams

Improved
Replication: Quorum
Replica

Replica

Writes

Replica

Quorum: Successful write on a majority

@swami_79

@ksshams

Not so easy..
New member in the
group

Replica D

Replica A

Replica B

Reads and
Writes from
client B

Replica C

Should I continue to serve reads?
Should I start a new quorum?
Replica E

Writes from
client A

Replica F

Classic Split Brain Issue in Replicated systems leading to lost writes!

Building correct distributed systems is
not straight forward..
• How do you handle replica failures?
• How do you ensure there is not a parallel

quorum?
• How do you handle partial failures of replicas?
• How do you handle concurrent failures?

@swami_79

@ksshams

correctness is hard, but necessary

Formal Methods
to minimize bugs, we must have a precise description
of the design

Formal Methods
code is too detailed

design documents and diagrams are vague &
imprecise
how would you express
partial failures or
concurrency?

Formal Methods
law of large numbers is your friend,

until you hit large
numbers
so design for scale

TLA+ to the rescue?

@swami_79

@ksshams

formal methods are necessary

but not sufficient..
@swami_79

@ksshams

don’t forget to test - no, serious ly

@swami_79

@ksshams

simulate
failures at unit
test level

fault injection
testing
scale testing

embrace failure and don’t be
surprised
datacenter
testing

network brown out
testing

testing is necessary
but not sufficient..
@swami_79

@ksshams

Customer

Architect

@swami_79

@ksshams

gamma
simulate real
world

one box
does it work?

release cycle
phased
deployment
treading lightly

monitor
does it still
work?
@swami_79

@ksshams

Monitor customer behavior

@swami_79

@ksshams

measuring customer experience is key

don’t be satisfied by average - look at
99 percentile
@swami_79

@ksshams

understand the scaling dimensions

@swami_79

@ksshams

understand how your service will be abused
@swami_79

@ksshams

let’s see these rules in action through a true story

@swami_79

@ksshams

we were building distributed systems all over
amazon.com

@swami_79

@ksshams

we needed a uniform and correct way to do
consensus..

@swami_79

@ksshams

service
so we built a paxos lock library
@swami_79

@ksshams

such a service is so much more useful than just
leader election..
it became a distributed
state store

@swami_79

@ksshams

such a service is so much more useful than just
leader election..
or a distributed state
store
wait wait.. you’re telling me
if I poll,
I can detect node failure?
@swami_79

@ksshams

we acted quickly - and scaled up our entire fleet
with more nodes

doh!!!!

we slowed
consensus...
@swami_79

@ksshams

understand the scaling dimensions

& scale them
independently...

@swami_79

@ksshams

a lock service has 3 components..

State Store

@swami_79

@ksshams

they must be scaled independently..

State Store
@swami_79

@ksshams

Let’s Go Over The demo from this morning

Real-time tweet analytics using DynamoDB
• Stream from Kinesis to DynamoDB

• What data do want in real-time?
• (per-second, top words)
• How does DynamoDB help?
• Atomic counters (per-word counts in that second)
• Indexed queries (top N word-counts in that second

WordCount Table

Local Secondary Index

Time

Word

Count

Time

Count

Word

2013-10-13T12:00
2013-10-13T12:00
2013-10-13T12:00
2013-10-13T12:03

Earth
Mars
Pluto
Earth

9
10
5
8

2013-10-13T12:00
2013-10-13T12:00
2013-10-13T12:00
2013-10-13T12:03

5
9
10
8

Pluto
Earth
Mars
Earth

Aggregate queries using Redshift
• Simple Redshift connector (buffer files, store in s3, call copy

command)
• Manifest copy connector
• 2 streams
• transaction table for deduplication
• manifest copy

Right tool for right job…
• Canal -> DynamoDB -> Redshift -> Glacier…

You are not done yet..
• Listen to customer feedback
• Iterate..

Example: DynamoDB
• Start with immediate needs of reliable, super scalable, low latency

datastore
• Iterate
• Developers wanted flexible query: Local Secondary Indexes
• Developers wanted parallel loads: Parallel Scans
• Mobile developers wanted direct access to their datastore: Fine-grained

Access Control
• Mobile developers wanted geo-awareness: Geospatial library
• Developers wanted DynamoDB on their laptop: DynamoDB Local
• Developers wanted richer query: Global Secondary Indexes
• We will continue to innovate..

Sacred Tenets in
Distributed Systems
don’t compromise durability
for performance

plan for success –
plan for scalability

plan for failures - fault tolerance is key
consistent performance
is important
release - think of blast radius
insist on correctness
@swami_79

@ksshams

understand
scaling
dimensions

observe
how service is
used
monitor
like a hawk

relentlessly
test
scalability
over features
@swami_79

strive
for
correctness

@ksshams

Please give us your feedback on this
presentation

SPOT 401

Don’t miss SPOT 201!!!

@swami_79

@ksshams

NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013

Recommended

Recommended

More Related Content

Similar to NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013

Similar to NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013