- Putting a data set into multiple nodes.
- Each replica has a full copy.
- A few reasons for replication:
- Availability and fault tolerance
- Mostly used with partitioning.
NOTHING FOR FREE!
- Very easy to do when the data is immutable.
- Problems start when we have multiple copies
of the data and we want to update them.
- Two main difficulties
- Handling updates
- Handling failures
The dangers of replication and a solution
- Gray et al.  classify replication models by 2
- Where to make updates: primary copy or update
- When to make updates: eagerly or lazily
WHERE: PRIMARY COPY
- There is a single replica managing the updates.
- Concurrency control is easy.
- No conflicts and no conflict-handling logic.
- Updates are executed on the primary and
secondaries with the same order.
- When primary fails, a new primary is elected.
- Ensuring a single and good primary is hard.
WHERE: UPDATE ANYWHERE
- Each replica can initiate a transaction to make
- Complex concurrency control.
- Deadlocks or conflicts are possible.
- In practice, there is also multi-leader.
WHEN: EAGER REPLICATION
- Synchronously updates all replicas as part of
one atomic transaction.
- Provides strong consistency.
- Not very flexible. Degree of availability can
degrade on node failures.
- Consensus algorithms.
WHEN: LAZY REPLICATION
- Updates each replica with a separate
- Updates can execute quite fast.
- Degree of availability is high.
- Eventual consistency.
- Data copies can diverge.
- Data loss or conflicts can occur.
PRIMARY COPY UPDATE ANYWHERE
Multi Paxos 
etcd and Consul (RAFT) 
Zookeeper (Zab) 
Hazelcast Cluster State Change 
PRIMARY COPY + EAGER REPLICATION
- When the primary fails, secondaries are
guaranteed to be up to date.
- Raft, Kafka etc.
- Majority approach can be used.
- In Kafka, in-sync-replica set is maintained. 
- Secondaries can be used for reads.
UPDATE ANYWHERE + EAGER REPLICATION
- Each replica can initiate a new transaction.
- Concurrent transactions can compete with
- Possibility of deadlocks.
- In the basic Paxos algorithm, there is no
PRIMARy COPY + LAZY REPLICATION
- The primary copy can execute updates fast.
- Secondaries can fall behind the primary. It is
called replication lag.
- It can lead to data loss during leader failover, but
still no conflicts.
- Secondaries can be used for reads.
- W + R > N
- W = 3, R = 1, N = 3
- W = 1, R = 3, N = 3
- W = 2, R = 2, N = 3
- If W or R is not met, consistency may be broken.
- Sloppy quorums and hinted handoff.
- Even if W and R are met, it can be still broken.
Conflict-free replicated data types (CRDTS)
- Special data types that achieve strong
eventual consistency and monotonicity 
- No conflicts
- Merge function has 3 properties:
- Commutative: A + B = B + A
- Associative: A + (B + C) = (A + B) + C
- Idempotent: f(f(x)) = f(x)
- Riak Data Types 
DISCARDING CONFLICTS: LAST WRITE WINS
- When 2 updates are concurrent, define an
arbitrary order among them.
- i.e., pretend that one of them is more recent.
- Attach a timestamp to each write.
- Cassandra uses physical timestamps , 
DETECTING CONFLICTS: VECTOR CLOCKS
- In Dynamo paper , each update is done
against a particular version of a data entry.
- Multiple versions of a data entry can exist together.
- Vector clocks  are used to track causality.
- The system can determine the authoritative version:
- The system cannot reconcile multiple versions:
Resolving conflicts and EVENTUAL CONVERGENCE
- Write repair
- Read repair
- Merkle trees
- We apply replication to make our systems
performant and fault tolerant.
- Replication suffers from core problems of
- We can build many replication protocols that
vary on the 2 dimensions we discussed.
- No silver bullet. It is a world of trade-offs.
 Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182.
 Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011.
 DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220.
 Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25.
 Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014.
 Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010.
 Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56.