3. - Leading open source Java IMDG
- Distributed Java collections, JCache, HD store, …
- Distributed computations and messaging
- Embedded or client - server deployment
- Integration modules & cloud friendly
- Highly available, scalable, elastic
4. REPLICATION
- Putting a data set into
multiple nodes.
- Each replica has a full copy.
- A few reasons for replication:
- Performance
- Availability
5. REPLICATION + PARTITIONING
- Mostly used with
partitioning.
- Two partitions: P1, P2
- Two replicas for each
partition.
6. NOTHING FOR FREE!
- Very easy to do when the data is immutable.
- Two main difficulties:
- Handling updates,
- Handling failures.
7. The dangers of replIcatIon and a solutIon
- Gray et al. [1] classify replication models by 2
parameters:
- Where to make updates: primary copy or update
anywhere
- When to make updates: eagerly or lazily
8. WHERE: PRIMARY COPY
- There is a single replica
managing the updates.
- No conflicts and no
conflict-handling logic.
- Implies sticky availability.
- When primary fails, a new
primary is elected.
9. WHERE: UPDATE ANYWHERE
- Each replica can initiate a
transaction to make an update.
- Complex concurrency control.
- Deadlocks or conflicts are
possible.
- In practice, there is also
multi-leader.
10. WHEN: EAGER REPLICATION
- Synchronously updates all
replicas as part of one atomic
transaction.
- Strong consistency.
- Level of availability can
degrade on node failures.
- Consensus algorithms
11. WHEN: LAZY REPLICATION
- Updates each replica with a
separate transaction.
- Updates can execute quite fast.
- High availability.
- Data copies can diverge.
12. WHERE?
WHEN?
PRIMARY COPY UPDATE ANYWHERE
EAGER
2PC [24]
Multi Paxos [5]
etcd and Consul (RAFT) [6]
Zookeeper (Zab) [7]
Kafka
2PC [24]
Paxos [5]
Hazelcast Cluster State Change [12]
MySQL 5.7 Group Replication [23]
LAZY
Hazelcast
MongoDB
ElasticSearch
Redis
Kafka
Dynamo [4]
Cassandra
Riak
Hazelcast Active-Active WAN
Replication [22]
13. PRIMARY COPY + EAGER REPLICATION
- When the primary fails, secondaries are
guaranteed to be up to date.
- Majority approach in consensus algorithms.
- Expensive. Mostly used for storing metadata.
- In Kafka, in-sync-replica set [11] is maintained.
14. UPDATE ANYWHERE + EAGER REPLICATION
- Each replica can initiate a new transaction.
- Concurrent transactions can compete with
each other.
- Possibility of races and deadlocks.
- Hazelcast Cluster State Change [12]
15. PRIMARy COPY + LAZY REPLICATION
- Hazelcast, Redis, ElasticSearch, Kafka ...
- The primary copy can execute updates fast.
- Secondaries can fall behind the primary. It is
called replication lag.
- It can lead to data loss during leader failover, but
still no conflicts.
- Secondaries can be used for reads.
16. Hazelcast: PRIMARy COPY + LAZY REPLICATION
PRIMARY
COPY
strong consistency
on a stable cluster
sticky availability
LAZY
REPLICATION
high throughput replication log
22. Recap
- We apply replication to make distributed
systems performant, available and fault
tolerant.
- Various replication protocols are built based
on when and where to make updates.
- No silver bullet. It is a world of trade-offs.
23. - We are hiring!
- Senior Java Developer
http://stackoverflow.com/jobs/129435/senior-java-developer-hazelcast
- Solution Architect
http://stackoverflow.com/jobs/131938/solutions-architect-hazelcast
24. REFerences
[1] Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182.
[2] Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011.
[3] http://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/
[4] DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220.
[5] Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25.
[6] Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014.
[7] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010.
[8] http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks
[9] https://aphyr.com/posts/299-the-trouble-with-timestamps
[10] Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56.
[11] http://kafka.apache.org/documentation.html#replication
[12] http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#managing-cluster-and-member-states
[13] E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp. Principles of Distributed Computing (PODC 00), ACM, 2000, pp. 7-10
[14] https://codahale.com/you-cant-sacrifice-partition-tolerance/
[15] http://blog.nahurst.com/visual-guide-to-nosql-systems
[16] http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
[17] https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
[18] https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
[19] Gilbert, Seth, and Nancy Lynch. "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services." Acm Sigact News 33.2 (2002): 51-59.
[20] https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
[21] https://henryr.github.io/cap-faq/
[22] http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#wan-replication
[23] https://dev.mysql.com/doc/refman/5.7/en/group-replication.html
[24] Notes on data base operating systems, JN Gray - Operating Systems, 1978