Put Your Thinking CAP On

Put Your Thinking
CAP On
Tomer Gabel, Wix
JDay Lviv, 2015

Credits
Originally a talk by
Yoav Abrahami (Wix)
Based on “Call Me Maybe” by
Kyle “Aphyr” Kingsbury

Brewer’s CAP Theorem
Partition
Tolerance
ConsistencyAvailability

By Example
• I want this book!
– I add it to the cart
– Then continue
browsing
• There’s only one copy
in stock!

By Example
• I want this book!
– I add it to the cart
– Then continue
browsing
• There’s only one copy
in stock!
• … and someone else
just bought it.

Consistency: Defined
• In a consistent
system:
All participants
see the same value
at the same time
• “Do you have this
book in stock?”

Consistency: Defined
• If our book store is an
inconsistent system:
– Two customers may
buy the book
– But there’s only one
item in inventory!
• We’ve just violated a
business constraint.

Availability: Defined
• An available system:
– Is reachable
– Responds to requests
(within SLA)
• Availability does not
guarantee success!
– The operation may fail
– “This book is no longer
available”

Availability: Defined
• What if the system is
unavailable?
– I complete the
checkout
– And click on “Pay”
– And wait
– And wait some more
– And…
• Did I purchase the
book or not?!

Partition Tolerance: Defined
• Partition: one or
more nodes are
unreachable
• No practical
system runs on a
single node
• So all systems are
susceptible!
A
B
C
D
E

“The Network is Reliable”
• All four happen in an
IP network
• To a client, delays
and drops are the
same
• Perfect failure
detection is provably
impossible1!
A B
drop delay
duplicate reorder
A B
A B A B
time
1 “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch and Paterson

Partition Tolerance: Reified
• External causes:
– Bad network config
– Faulty equipment
– Scheduled
maintenance
• Even software causes
partitions:
– Bad network config.
– GC pauses
– Overloaded servers
• Plenty of war stories!
– Netflix
– Twilio
– GitHub
– Wix :-)
• Some hard numbers1:
– 5.2 failed devices/day
– 59K lost packets/day
– Adding redundancy
only improves by 40%
1 “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, Gill et al

In Pictures
• Let’s consider a simple
system:
– Service A writes values
– Service B reads values
– Values are replicated
between nodes
• These are “ideal”
systems
– Bug-free, predictable
Node 1
V0A
Node 2
V0B

In Pictures
• “Sunny day scenario”:
– A writes a new value V1
– The value is replicated
to node 2
– B reads the new value
Node 1
V0A
Node 2
V0B
V1
V1
V1
V1

In Pictures
• What happens if the
network drops?
– Replication fails
– B still sees the old value
– The system is
inconsistent
Node 1
V0A
Node 2
V0B
V1
V0
V1

In Pictures
• Possible mitigation is
synchronous replication
– Cannot replicate, so write is
rejected
– Both A and B still see V0
– The system is logically
unavailable
Node 1
V0A
Node 2
V0B
V1

The network is not reliable
• Distributed systems must handle partitions
• Any modern system runs on >1 nodes…
• … and is therefore distributed
• Ergo, you have to choose:
– Consistency over availability
– Availability over consistency

Granularity
• Real systems comprise many operations
– “Add book to cart”
– “Pay for the book”
• Each has different properties
• It’s a spectrum, not a binary choice!
Consistency Availability
Shopping CartCheckout

CAP IN THE REAL
WORLD
Kyle “Aphyr” Kingsbury
Breaking consistency
guarantees since 2013

PostgreSQL
• Traditional RDBMS
– Transactional
– ACID compliant
• Primarily a CP system
– Writes against a
master node
• “Not a distributed
system”
– Except with a client at
play!

PostgreSQL
• Writes are a simplified
2PC:
– Client votes to commit
– Server validates
transaction
– Server stores changes
– Server acknowledges
commit
– Client receives
acknowledgement
Client Server
Store

PostgreSQL
• But what if the ack is
never received?
• The commit is already
stored…
• … but the client has
no indication!
• The system is in an
inconsistent state
Client Server
Store
?

PostgreSQL
• Let’s experiment!
• 5 clients write to a
PostgreSQL instance
• We then drop the server
from the network
• Results:
– 1000 writes
– 950 acknowledged
– 952 survivors

So what can we do?
1. Accept false-negatives
– May not be acceptable for your use case!
2. Use idempotent operations
3. Apply unique transaction IDs
– Query state after partition is resolved
• These strategies apply to any RDBMS

• A document-oriented database
• Availability/scale via replica sets
– Client writes to a master node
– Master replicates writes to n replicas
• User-selectable consistency guarantees

MongoDB
• When a partition occurs:
– If the master is in the
minority, it is demoted
– The majority promotes a
new master…
– … selected by the highest
optime

MongoDB
• The cluster “heals” after partition resolution:
– The “old” master rejoins the cluster
– Acknowleged minority writes are reverted!

MongoDB
• Let’s experiment!
• Set up a 5-node
MongoDB cluster
• 5 clients write to
the cluster
• We then partition
the cluster
• … and restore it to
see what happens

MongoDB
• With write concern
unacknowleged:
– Server does not ack
writes (except TCP)
– The default prior to
November 2012
• Results:
– 6000 writes
– 3319 survivors
– 42% data loss!

MongoDB
acknowleged:
– Server acknowledges
writes (after store)
– The default guarantee
• Results:
– 6000 writes
– 3692 survivors
– 37% data loss!

MongoDB
replica acknowleged:
– Client specifies
minimum replicas
– Server acks after
writes to replicas
• Results:
– 6000 writes
– 3768 survivors
– 33% data loss!

MongoDB
majority:
– For an n-node cluster,
requires at least n/2
replicas
– Also called “quorum”
• Results:
– 6000 writes
– 5701 survivors
– No data loss

So what can we do?
1. Keep calm and carry on
– As Aphyr puts it, “not all applications need
consistency”
– Have a reliable backup strategy
– … and make sure you drill restores!
2. Use write concern majority
– And take the performance hit

The prime suspects
• Aphyr’s Jepsen tests
include:
– Redis
– Riak
– Zookeeper
– Kafka
– Cassandra
– RabbitMQ
– etcd (and consul)
– ElasticSearch
• If you’re
considering them,
go read his posts
• In fact, go read his
posts regardless
http://aphyr.com/tags/jepsen

STRATEGIES FOR
DISTRIBUTED SYSTEMS

Immutable Data
• Immutable (adj.):
“Unchanging over
time or unable to be
changed.”
• Meaning:
– No deletes
– No updates
– No merge conflicts
– Replication is trivial

Idempotence
• An idempotent
operation:
– Can be applied one or
more times with the
same effect
• Enables retries
• Not always possible
– Side-effects are key
– Consider: payments

Eventual Consistency
• A design which prefers
availability
• … but guarantees that
clients will eventually see
consistent reads
• Consider git:
– Always available locally
– Converges via push/pull
– Human conflict resolution

Eventual Consistency
• The system expects
data to diverge
• … and includes
mechanisms to regain
convergence
– Partial ordering to
minimize conflicts
– A merge function to
resolve conflicts

Vector Clocks
• A technique for partial ordering
• Each node has a logical clock
– The clock increases on every write
– Track the last observed clocks for each item
– Include this vector on replication
• When observed and inbound vectors have
no common ancestor, we have a conflict
• This lets us know when history diverged

CRDTs
• Commutative Replicated Data Types1
• A CRDT is a data structure that:
– Eventually converges to a consistent state
– Guarantees no conflicts on replication
1 “A comprehensive study of Convergent and Commutative Replicated Data Types”, Shapiro et al

CRDTs
• CRDTs provide specialized semantics:
– G-Counter: Monotonously increasing counter
– PN-Counter: Also supports decrements
– G-Set: A set that only supports adds
– 2P-Set: Supports removals but only once
• OR-Sets are particularly useful
– Keeps track of both additions and removals
– Can be used for shopping carts

WE’RE DONE
HERE!
Thank you for listening
tomer@tomergabel.com
@tomerg
http://il.linkedin.com/in/tomergabel
Aphyr’s “Call Me Maybe” blog posts:
http://aphyr.com/tags/jepsen

Put Your Thinking CAP On

More Related Content

What's hot

Similar to Put Your Thinking CAP On

More from Tomer Gabel

Recently uploaded

Put Your Thinking CAP On

Editor's Notes