SlideShare a Scribd company logo
REPLICATION
IN THE WILD
Ensar Basri Kahveci
REPLICATION
- Putting a data set into multiple nodes.
- Each replica has a full copy.
- A few reasons for replication:
- Performance
- Availability and fault tolerance
- Mostly used with partitioning.
NOTHING FOR FREE!
- Very easy to do when the data is immutable.
- Problems start when we have multiple copies
of the data and we want to update them.
- Two main difficulties
- Handling updates
- Handling failures
The dangers of replication and a solution
- Gray et al. [1] classify replication models by 2
parameters:
- Where to make updates: primary copy or update
anywhere
- When to make updates: eagerly or lazily
WHERE: PRIMARY COPY
- There is a single replica managing the updates.
- Concurrency control is easy.
- No conflicts and no conflict-handling logic.
- Updates are executed on the primary and
secondaries with the same order.
- When primary fails, a new primary is elected.
- Ensuring a single and good primary is hard.
WHERE: UPDATE ANYWHERE
- Each replica can initiate a transaction to make
an update.
- Complex concurrency control.
- Deadlocks or conflicts are possible.
- In practice, there is also multi-leader.
WHEN: EAGER REPLICATION
- Synchronously updates all replicas as part of
one atomic transaction.
- Provides strong consistency.
- Not very flexible. Degree of availability can
degrade on node failures.
- Consensus algorithms.
WHEN: LAZY REPLICATION
- Updates each replica with a separate
transaction.
- Updates can execute quite fast.
- Degree of availability is high.
- Eventual consistency.
- Data copies can diverge.
- Data loss or conflicts can occur.
WHERE?
WHEN?
PRIMARY COPY UPDATE ANYWHERE
EAGER
strong consistency
simple concurrency
slow
inflexible
strong consistency
complex concurrency
slow
expensive
deadlocks
LAZY
fast
eventual consistency
simple concurrency
inconsistency
fast
available
flexible
eventual consistency
inconsistency
conflicts
WHERE?
WHEN?
PRIMARY COPY UPDATE ANYWHERE
EAGER
Multi Paxos [5]
etcd and Consul (RAFT) [6]
Zookeeper (Zab) [7]
Kafka
Paxos [5]
Hazelcast Cluster State Change [12]
LAZY
Hazelcast
MongoDB
ElasticSearch
Redis
Dynamo [4]
Cassandra
Riak
PRIMARY COPY + EAGER REPLICATION
- When the primary fails, secondaries are
guaranteed to be up to date.
- Raft, Kafka etc.
- Majority approach can be used.
- In Kafka, in-sync-replica set is maintained. [11]
- Secondaries can be used for reads.
UPDATE ANYWHERE + EAGER REPLICATION
- Each replica can initiate a new transaction.
- Concurrent transactions can compete with
each other.
- Possibility of deadlocks.
- In the basic Paxos algorithm, there is no
designated leader.
PRIMARy COPY + LAZY REPLICATION
- The primary copy can execute updates fast.
- Secondaries can fall behind the primary. It is
called replication lag.
- It can lead to data loss during leader failover, but
still no conflicts.
- Secondaries can be used for reads.
UPDATE ANYWHERE + LAZY REPLICATION
- Dynamo-style [4] highly available databases.
- Quorums
- Concurrent updates are first-class citizens.
- Possibility of conflicts
- Avoiding, discarding, detecting & resolving conflicts
- Eventual convergence
- Write repair, read repair and anti-entropy
QUORUMS
- W + R > N
- W = 3, R = 1, N = 3
- W = 1, R = 3, N = 3
- W = 2, R = 2, N = 3
- If W or R is not met, consistency may be broken.
- Sloppy quorums and hinted handoff.
- Even if W and R are met, it can be still broken.
Conflict-free replicated data types (CRDTS)
- Special data types that achieve strong
eventual consistency and monotonicity [2]
- No conflicts
- Merge function has 3 properties:
- Commutative: A + B = B + A
- Associative: A + (B + C) = (A + B) + C
- Idempotent: f(f(x)) = f(x)
- Riak Data Types [3]
DISCARDING CONFLICTS: LAST WRITE WINS
- When 2 updates are concurrent, define an
arbitrary order among them.
- i.e., pretend that one of them is more recent.
- Attach a timestamp to each write.
- Cassandra uses physical timestamps [8], [9]
DETECTING CONFLICTS: VECTOR CLOCKS
- In Dynamo paper [4], each update is done
against a particular version of a data entry.
- Multiple versions of a data entry can exist together.
- Vector clocks [10] are used to track causality.
- The system can determine the authoritative version:
syntactic reconciliation
- The system cannot reconcile multiple versions:
semantic reconciliation
Resolving conflicts and EVENTUAL CONVERGENCE
- Write repair
- Read repair
- Anti-entropy
- Merkle trees
Recap
- We apply replication to make our systems
performant and fault tolerant.
- Replication suffers from core problems of
distributed systems.
- We can build many replication protocols that
vary on the 2 dimensions we discussed.
- No silver bullet. It is a world of trade-offs.
REFerences
[1] Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182.
[2] Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011.
[3] http://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/
[4] DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220.
[5] Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25.
[6] Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014.
[7] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010.
[8] http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks
[9] https://aphyr.com/posts/299-the-trouble-with-timestamps
[10] Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56.
[11] http://kafka.apache.org/documentation.html#replication
[12] http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#managing-cluster-and-member-states
THANKS!Any questions?

More Related Content

What's hot

Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Ensar Basri Kahveci
 
data replication
data replicationdata replication
data replication
Hassanein Alwan
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
Syed Zaid Irshad
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEAravind NC
 
Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...
Neena R Krishna
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
JPINFOTECH JAYAPRAKASH
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...Mumbai Academisc
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algosAkhil Sharma
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platforms
Syed Zaid Irshad
 
Distruted applications
Distruted applicationsDistruted applications
Distruted applications
Gabriele Santomaggio
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
IEEEGLOBALSOFTTECHNOLOGIES
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
IEEEFINALYEARPROJECTS
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGSYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
Aparna Bhadran
 
Management on Cloud 2011
Management on Cloud 2011Management on Cloud 2011
Management on Cloud 2011steccami
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
Prakhar Rastogi
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
Page Maker
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
Klika Tech, Inc
 
Buffer management
Buffer managementBuffer management
Buffer management
KarthigaGunasekaran1
 

What's hot (20)

Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
 
data replication
data replicationdata replication
data replication
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platforms
 
Distruted applications
Distruted applicationsDistruted applications
Distruted applications
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
 
Cap Theorem
Cap TheoremCap Theorem
Cap Theorem
 
Chap 4
Chap 4Chap 4
Chap 4
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGSYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
 
Management on Cloud 2011
Management on Cloud 2011Management on Cloud 2011
Management on Cloud 2011
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
Buffer management
Buffer managementBuffer management
Buffer management
 

Similar to Replication in the Wild

Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
Onur Dayıbaşı
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
AnkaraCloud
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
ashish61_scs
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
Ulf Wendel
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
manimozhi98
 
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
Ulf Wendel
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
Saptarshi Chatterjee
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
jlacefie
 
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault ToleranceZookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Alluxio, Inc.
 
10 replication
10 replication10 replication
10 replication
ashish61_scs
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDB
MariaDB plc
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
Ulf Wendel
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
Ulf Wendel
 
Scaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDBScaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDB
Thomas Huber
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
DataStax
 
Getting Modern With MySQL
Getting Modern With MySQLGetting Modern With MySQL
Getting Modern With MySQL
All Things Open
 
Getting modern with my sql
Getting modern with my sqlGetting modern with my sql
Getting modern with my sql
Jakob Lorberblatt
 
ZooKeeper (and other things)
ZooKeeper (and other things)ZooKeeper (and other things)
ZooKeeper (and other things)
Jonathan Halterman
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for Failures
Rodolfo Kohn
 

Similar to Replication in the Wild (20)

Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
 
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault ToleranceZookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
 
10 replication
10 replication10 replication
10 replication
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDB
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
 
Scaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDBScaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDB
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Getting Modern With MySQL
Getting Modern With MySQLGetting Modern With MySQL
Getting Modern With MySQL
 
Getting modern with my sql
Getting modern with my sqlGetting modern with my sql
Getting modern with my sql
 
ZooKeeper (and other things)
ZooKeeper (and other things)ZooKeeper (and other things)
ZooKeeper (and other things)
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for Failures
 

More from Ensar Basri Kahveci

java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
Ensar Basri Kahveci
 
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
Ensar Basri Kahveci
 
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
Ensar Basri Kahveci
 
java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019
Ensar Basri Kahveci
 
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Ensar Basri Kahveci
 
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Ensar Basri Kahveci
 
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
Ensar Basri Kahveci
 
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship TurkeyDistributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Ensar Basri Kahveci
 
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Ensar Basri Kahveci
 
Ankara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with ScalaAnkara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with Scala
Ensar Basri Kahveci
 

More from Ensar Basri Kahveci (10)

java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
 
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
 
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
 
java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019
 
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
 
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
 
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
 
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship TurkeyDistributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
 
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
 
Ankara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with ScalaAnkara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with Scala
 

Recently uploaded

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 

Recently uploaded (20)

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 

Replication in the Wild

  • 2. REPLICATION - Putting a data set into multiple nodes. - Each replica has a full copy. - A few reasons for replication: - Performance - Availability and fault tolerance - Mostly used with partitioning.
  • 3. NOTHING FOR FREE! - Very easy to do when the data is immutable. - Problems start when we have multiple copies of the data and we want to update them. - Two main difficulties - Handling updates - Handling failures
  • 4. The dangers of replication and a solution - Gray et al. [1] classify replication models by 2 parameters: - Where to make updates: primary copy or update anywhere - When to make updates: eagerly or lazily
  • 5. WHERE: PRIMARY COPY - There is a single replica managing the updates. - Concurrency control is easy. - No conflicts and no conflict-handling logic. - Updates are executed on the primary and secondaries with the same order. - When primary fails, a new primary is elected. - Ensuring a single and good primary is hard.
  • 6. WHERE: UPDATE ANYWHERE - Each replica can initiate a transaction to make an update. - Complex concurrency control. - Deadlocks or conflicts are possible. - In practice, there is also multi-leader.
  • 7. WHEN: EAGER REPLICATION - Synchronously updates all replicas as part of one atomic transaction. - Provides strong consistency. - Not very flexible. Degree of availability can degrade on node failures. - Consensus algorithms.
  • 8. WHEN: LAZY REPLICATION - Updates each replica with a separate transaction. - Updates can execute quite fast. - Degree of availability is high. - Eventual consistency. - Data copies can diverge. - Data loss or conflicts can occur.
  • 9. WHERE? WHEN? PRIMARY COPY UPDATE ANYWHERE EAGER strong consistency simple concurrency slow inflexible strong consistency complex concurrency slow expensive deadlocks LAZY fast eventual consistency simple concurrency inconsistency fast available flexible eventual consistency inconsistency conflicts
  • 10. WHERE? WHEN? PRIMARY COPY UPDATE ANYWHERE EAGER Multi Paxos [5] etcd and Consul (RAFT) [6] Zookeeper (Zab) [7] Kafka Paxos [5] Hazelcast Cluster State Change [12] LAZY Hazelcast MongoDB ElasticSearch Redis Dynamo [4] Cassandra Riak
  • 11. PRIMARY COPY + EAGER REPLICATION - When the primary fails, secondaries are guaranteed to be up to date. - Raft, Kafka etc. - Majority approach can be used. - In Kafka, in-sync-replica set is maintained. [11] - Secondaries can be used for reads.
  • 12. UPDATE ANYWHERE + EAGER REPLICATION - Each replica can initiate a new transaction. - Concurrent transactions can compete with each other. - Possibility of deadlocks. - In the basic Paxos algorithm, there is no designated leader.
  • 13. PRIMARy COPY + LAZY REPLICATION - The primary copy can execute updates fast. - Secondaries can fall behind the primary. It is called replication lag. - It can lead to data loss during leader failover, but still no conflicts. - Secondaries can be used for reads.
  • 14. UPDATE ANYWHERE + LAZY REPLICATION - Dynamo-style [4] highly available databases. - Quorums - Concurrent updates are first-class citizens. - Possibility of conflicts - Avoiding, discarding, detecting & resolving conflicts - Eventual convergence - Write repair, read repair and anti-entropy
  • 15. QUORUMS - W + R > N - W = 3, R = 1, N = 3 - W = 1, R = 3, N = 3 - W = 2, R = 2, N = 3 - If W or R is not met, consistency may be broken. - Sloppy quorums and hinted handoff. - Even if W and R are met, it can be still broken.
  • 16. Conflict-free replicated data types (CRDTS) - Special data types that achieve strong eventual consistency and monotonicity [2] - No conflicts - Merge function has 3 properties: - Commutative: A + B = B + A - Associative: A + (B + C) = (A + B) + C - Idempotent: f(f(x)) = f(x) - Riak Data Types [3]
  • 17. DISCARDING CONFLICTS: LAST WRITE WINS - When 2 updates are concurrent, define an arbitrary order among them. - i.e., pretend that one of them is more recent. - Attach a timestamp to each write. - Cassandra uses physical timestamps [8], [9]
  • 18. DETECTING CONFLICTS: VECTOR CLOCKS - In Dynamo paper [4], each update is done against a particular version of a data entry. - Multiple versions of a data entry can exist together. - Vector clocks [10] are used to track causality. - The system can determine the authoritative version: syntactic reconciliation - The system cannot reconcile multiple versions: semantic reconciliation
  • 19. Resolving conflicts and EVENTUAL CONVERGENCE - Write repair - Read repair - Anti-entropy - Merkle trees
  • 20. Recap - We apply replication to make our systems performant and fault tolerant. - Replication suffers from core problems of distributed systems. - We can build many replication protocols that vary on the 2 dimensions we discussed. - No silver bullet. It is a world of trade-offs.
  • 21. REFerences [1] Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182. [2] Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011. [3] http://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/ [4] DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220. [5] Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25. [6] Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014. [7] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010. [8] http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks [9] https://aphyr.com/posts/299-the-trouble-with-timestamps [10] Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56. [11] http://kafka.apache.org/documentation.html#replication [12] http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#managing-cluster-and-member-states