@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Eventual Consistency without Consensus with
CRDTs
Sam BESSALAH - @samklr
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
“ This talk is filled with words and terms, that might
make you sound too nerdy or pedantic at dinners
with non developers friends. ”
Use with caution.
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
“ Distributed Programming, generally a bad idea,
best avoided. ”
-Peter Bourgon
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
- Availability
- Fault tolerance
- Throughtput
- Architecture
- Economics
Why we use distributed systems?
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CAP Theorem
S. Gilbert and N. A. Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services.
SIGACT News, 33(2):51–59, 2002.
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
http://bravenewgeek.com/from-mainframe-to-microservice-an-introduction-to-distributed-systems/
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Data Consistency?
Consensus Systems ,
Locking Services and
“barbaric“ algorithms
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Data Consistency?
- Distributed Consensus : Costly and nearly impossible.
Multi Phase Commit, Stae Replication
- Two Phase Commit : Blocking, dependent on coordinator,
deadlocks inducing
- Three Phase commit : abort on timeouts, non blocking but easily
fails on network partitions
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Data Consistency?
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Data Consistency?
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Data Consistency?
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Data Consistency?
RAFT
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Data Consistency?
Locking Services
Chubby (Google)
Zab (Yahoo)
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
http://bravenewgeek.com/from-mainframe-to-microservice-an-introduction-to-distributed-systems/
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
@aphyr
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
“ Consistency is a property of your data,
not of your nodes”
@aphyr
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
AP Systems
Eventual Consistency
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
AP Systems
Eventual Consistency
- High Availability
- Low Latency
- Fault tolerant
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
AP Systems
Eventual Consistency
- High Availability
- Low Latency
- Fault tolerant
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Dynamo Systems
Riak
Voldemort
Cassandra
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Conflicts resolution
- Semantic Resolution
- Vector Clocks
- Last write wins (LWW)
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Vector ClocksLWW
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Convergent Replicated
Data Types
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CRDTs
- Take the consistency problem to the level of
Data Structures
- Their state resolves automatically (eventually) to
a single coherent value
- Maintain multiple copies of your data
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CRDTs : Monotonicity
- Every new operation adds information
- Data is never immediately destroyed
- Most things are trasparent to the application
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CRDTs : 2 types
- State based or convergent ( CvRDTs)
- Commutative or Operations based (CmRDTs)
*
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CvRDTs
- All replicas connected
- At least once semantics usually
- State Changes advance upwards according to partial
order
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CmRDTs
- All replicas connected
- Need Reliable broadcast with ordering guarantees
- Best suitable for commutative updates
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CRDTs : Fancy words
CRDTs are idempotent, commutative monoids !!!
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CRDTs : Fancy words
CRDTs are idempotent, commutative monoid !!!
aka
Joint Semi Lattice
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
- Idempotence : A * A = A
- Commutative : A*B = B* A
- Associative : A*(B*C) = (A*B)*C
Joint SemiLattice
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CvRDTs
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Some common CRDTs
- Registers (LWW, Multi Valued Register)
- G-Counter
- PN Counter
- G-Set, 2P-Set and OR-Set
- Graph
- Maps
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
G- Counter : grow only counter
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
PN-Counter
- Positive and Negative Counters
- Uses two G-counter
- One for increments(P) and another for
decrement(N)
- Result is the difference
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
G-Set
Grow only set, that only allows to add an element
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
2P-Set
- Two Phase Set
- Built with 2 G-set for add and removal
- Can’t add an already removed element :
Tombstone set
- Has a tombstone that maintains deleted
elements
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
LWW-Element-set
- Add Timestamp to “add”
and “remove” states wit h a
timestamp.
- Greatest timestamps wins
- Close to cassandra model
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
CRDTs in the Wild
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
www.jepsen.io
- Riak
- Cassandra
- Kafka
- etc.
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Roshi
https://github.com/soundcloud/roshi
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Eventuate
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT
Limitations
- Garbage Collection
- Not always easy to have Semi Lattice for all
use case
- Inducing some strange behaviours
- Might require adding a stronger consistency
models: Raft, Paxos, etc
- Use when availability is really important
- Don’t use them for your billing application
- Definitely not a panacea
@YourTwitterHandle#Devoxx #YourTag @samklr#devoxx #CRDT

Eventual Consitency with CRDTS

Editor's Notes

  • #10 availability : every req gets a response consistency all nodes see the same data at the same time Partition tolerance system keeps going despite failure or message loss
  • #18 availability : every req gets a response consistency all nodes see the same data at the same time Partition tolerance system keeps going despite failure or message loss
  • #19 Linearizability: single-operation, single-object, real-time order Serializability: multi-operation, multi-object, arbitrary total order
  • #20 availability : every req gets a response consistency all nodes see the same data at the same time Partition tolerance system keeps going despite failure or message loss
  • #26 Semantic Resolution : Let business decide Vector Clock : A vector clock (vclock) is a system for tracking the causality of concurrent updates to a piece of data.
  • #41 P and Negative Counter
  • #42 Hard to make them converge Can build other data types on top of them Only two operations : Add an element and Remove
  • #43 2 Phase
  • #44 2 Phase
  • #47 Casandra Columns Cassandra collections Riak Data type : PN-counter