Successfully reported this slideshow.

Hermes Reliable Replication Protocol - ASPLOS'20 Presentation

2

Share

Loading in …3
×
1 of 75
1 of 75

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Hermes Reliable Replication Protocol - ASPLOS'20 Presentation

  1. 1. Hermes A Fast, Fault-tolerant and Linearizable Replication Protocol Antonios Katsarakis, V. Gavrielatos, S. Katebzadeh, A. Joshi*, B. Grot, V. Nagarajan, A. Dragojevic† University of Edinburgh, *Intel, †Microsoft Research hermes-protocol.com Thanks to:
  2. 2. In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed datastores 2 Distributed Datastore
  3. 3. In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed datastores 3 Distributed Datastore
  4. 4. In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed datastores 4 Distributed Datastore
  5. 5. In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed datastores 5 Distributed Datastore
  6. 6. In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed datastores 6 Distributed Datastore
  7. 7. In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed datastores 7 Distributed Datastore Mandates data replication
  8. 8. Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Replication 101 9 …… … …
  9. 9. Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Replication 101 10 …… … …
  10. 10. Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Replication 101 11 …… … … Reliable Replication Protocol
  11. 11. Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Replication 101 12 …… … … Reliable Replication Protocol
  12. 12. Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Replication 101 13 Can reliable protocols provide high performance? …… … … Reliable Replication Protocol
  13. 13. Golden standard strong consistency and fault tolerance Low performance reads à inter-replica communication writes à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 15 Paxos
  14. 14. Golden standard strong consistency and fault tolerance Low performance reads à inter-replica communication writes à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 16 Paxos
  15. 15. Golden standard strong consistency and fault tolerance Low performance reads à inter-replica communication writes à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 17 Paxos
  16. 16. Golden standard strong consistency and fault tolerance Low performance reads à inter-replica communication writes à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 18 Paxos State-of-the-art reliable protocols exploit failure-free operation for performance
  17. 17. 20 Performance of state-of-the-art protocols Leader ZAB replicas
  18. 18. 21 Performance of state-of-the-art protocols Leader ZAB writeread bcastucast Local reads form all replicas à Fast
  19. 19. 22 Performance of state-of-the-art protocols Leader ZAB Leader Writes serialize on the leader à Low throughput writeread bcastucast Local reads form all replicas à Fast
  20. 20. 23 Performance of state-of-the-art protocols Leader ZAB Leader Writes serialize on the leader à Low throughput Head Tail CRAQ writeread bcastucast Local reads form all replicas à Fast
  21. 21. 24 Performance of state-of-the-art protocols Leader ZAB Leader Writes serialize on the leader à Low throughput Head Tail CRAQ writeread bcastucast Local reads form all replicas à Fast Local reads form all replicas à Fast
  22. 22. 25 Performance of state-of-the-art protocols Leader ZAB Leader Writes serialize on the leader à Low throughput Head Tail CRAQ Head Tail Writes traverse length of the chain à High latency writeread bcastucast Local reads form all replicas à Fast Local reads form all replicas à Fast
  23. 23. 26 Performance of state-of-the-art protocols Leader ZAB Leader Writes serialize on the leader à Low throughput Head Tail CRAQ Head Tail Writes traverse length of the chain à High latency writeread bcastucast Fast reads but poor write performance Local reads form all replicas à Fast Local reads form all replicas à Fast
  24. 24. 28 Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Key protocol features for high performance
  25. 25. 29 Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Key protocol features for high performance Local reads from all replicas
  26. 26. 30 Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Key protocol features for high performance Local reads from all replicas Head Tail Avoid long latencies
  27. 27. 32 Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Leader Avoid write serialization Key protocol features for high performance Local reads from all replicas
  28. 28. 33 Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Key protocol features for high performance Local reads from all replicas Fast, decentralized, fully concurrent writes
  29. 29. 34 Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Key protocol features for high performance Local reads from all replicas Fast, decentralized, fully concurrent writes Existing replication protocols are deficient
  30. 30. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write Enter Hermes 36
  31. 31. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write Enter Hermes 37 write(A=3) Coordinator Followers
  32. 32. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write Enter Hermes 38 States of A: Valid, Invalid write(A=3) Coordinator Followers I Invalidation I
  33. 33. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write Enter Hermes 39 States of A: Valid, Invalid write(A=3) Coordinator Followers At this point, no stale reads can be served Strong consistency! I Invalidation I
  34. 34. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value Enter Hermes 41 States of A: Valid, Invalid write(A=3) Coordinator Followers Ack Ack I Invalidation I
  35. 35. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value Enter Hermes 42 States of A: Valid, Invalid write(A=3) Coordinator Followers Ack Ack I Invalidation I Vcommit
  36. 36. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value Enter Hermes 43 States of A: Valid, Invalid write(A=3) Coordinator Followers V Validation V Ack Ack I Invalidation I V
  37. 37. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value Enter Hermes 44 States of A: Valid, Invalid write(A=3) Coordinator Followers V Validation V Ack Ack I Invalidation I V
  38. 38. Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value Enter Hermes 45 States of A: Valid, Invalid write(A=3) Coordinator Followers What about concurrent writes? V Validation V Ack Ack I Invalidation I V
  39. 39. Challenge How to efficiently order concurrent writes to an object? Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Concurrent writes = challenge 47 write(A=3) write(A=1)
  40. 40. Challenge How to efficiently order concurrent writes to an object? Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Concurrent writes = challenge 48 write(A=3) write(A=1)
  41. 41. Challenge How to efficiently order concurrent writes to an object? Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Concurrent writes = challenge 49 write(A=3) write(A=1) Inv(TS1) Inv(TS4)
  42. 42. Challenge How to efficiently order concurrent writes to an object? Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Concurrent writes = challenge 50 write(A=3) write(A=1) Inv(TS1) Inv(TS4)
  43. 43. Challenge How to efficiently order concurrent writes to an object? Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Concurrent writes = challenge 51 write(A=3) write(A=1) Inv(TS1) Inv(TS4)
  44. 44. Challenge How to efficiently order concurrent writes to an object? Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Concurrent writes = challenge 52 write(A=3) write(A=1) Inv(TS1) Inv(TS4) Broadcast + Invalidations + TS à high performance writes
  45. 45. 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Writes in Hermes 54 Broadcast + Invalidations + TS
  46. 46. 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Writes in Hermes 55 Broadcast + Invalidations + TS
  47. 47. 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Writes in Hermes 56 Broadcast + Invalidations + TS
  48. 48. 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Writes in Hermes 57 Broadcast + Invalidations + TS
  49. 49. 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Writes in Hermes 58 Awesome! But what about fault tolerance? Broadcast + Invalidations + TS
  50. 50. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation 60 Handling faults in Hermes
  51. 51. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 61 Handling faults in Hermes
  52. 52. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 62 Handling faults in Hermes Inv(TS) I I
  53. 53. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 63 Handling faults in Hermes Inv(TS) Coordinator fails I I
  54. 54. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 64 Handling faults in Hermes read(A) Inv(TS) Coordinator fails I I
  55. 55. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 65 Handling faults in Hermes read(A) Inv(TS) Coordinator fails I I
  56. 56. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Idea Allow any Invalidated replica to replay the write and unblock. Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 66 Handling faults in Hermes read(A) Inv(TS) Coordinator fails I I
  57. 57. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Idea Allow any Invalidated replica to replay the write and unblock. How? Insight: to replay a write need - Write’s original TS (for ordering) - Write value Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 67 Handling faults in Hermes read(A) Inv(TS) Coordinator fails I I
  58. 58. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Idea Allow any Invalidated replica to replay the write and unblock. How? Insight: to replay a write need - Write’s original TS (for ordering) - Write value TS sent with Invalidation, but write value is not Solution: send write value with Invalidation à Early value propagation write(A=3) Coordinator Followers 68 Handling faults in Hermes read(A) Inv(TS) Coordinator fails I I
  59. 59. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Idea Allow any Invalidated replica to replay the write and unblock. How? Insight: to replay a write need - Write’s original TS (for ordering) - Write value TS sent with Invalidation, but write value is not Solution: send write value with Invalidation à Early value propagation Handling faults in Hermes 70 Inv(3,TS)write(A=3) Coordinator fails I I Coordinator Followers
  60. 60. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Idea Allow any Invalidated replica to replay the write and unblock. How? Insight: to replay a write need - Write’s original TS (for ordering) - Write value TS sent with Invalidation, but write value is not Solution: send write value with Invalidation à Early value propagation Handling faults in Hermes 71 Inv(3,TS)write(A=3) read(A) Coordinator fails I I Coordinator Followers
  61. 61. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Idea Allow any Invalidated replica to replay the write and unblock. How? Insight: to replay a write need - Write’s original TS (for ordering) - Write value TS sent with Invalidation, but write value is not Solution: send write value with Invalidation à Early value propagation V V Inv(3,TS) completion write replay read(A) Handling faults in Hermes 73 Inv(3,TS)write(A=3) Coordinator fails I I Coordinator Followers
  62. 62. Problem A failure in the middle of a write can permanently leave a replica in Invalid state Idea Allow any Invalidated replica to replay the write and unblock. How? Insight: to replay a write need - Write’s original TS (for ordering) - Write value TS sent with Invalidation, but write value is not Solution: send write value with Invalidation à Early value propagation V V Inv(3,TS) completion write replay read(A) Handling faults in Hermes 74 Inv(3,TS)write(A=3) Early value propagation enables write replays Coordinator fails I I Coordinator Followers
  63. 63. Strong Consistency through CC-inspired Invalidations Fault-tolerance write replays via early value propagation High Performance Local reads at all replicas High performance writes Fast Decentralized Fully-distributed Hermes recap 76 V I write(A=3) commit Coordinator Followers Inv(3,TS) V I V Broadcast + Invalidations + TS + early value propagation
  64. 64. Strong Consistency through CC-inspired Invalidations Fault-tolerance write replays via early value propagation High Performance Local reads at all replicas High performance writes Fast Decentralized Fully-distributed Hermes recap 77 V I write(A=3) commit Coordinator Followers Inv(3,TS) V I V Broadcast + Invalidations + TS + early value propagation In the paper: protocol details, RMWs, other goodies
  65. 65. Evaluation 78 State-of-the-art hardware testbed - 5 servers - 2x 10 core Intel Xeon E5-2630v4 per server - 56 Gb/s InfiniBand NICs KVS Workload - Uniform access distribution - Million KV pairs: <8B keys, 32B values> Evaluated protocols: - ZAB - CRAQ - Hermes
  66. 66. Performance 79 Throughput high-perf. writes + local reads conc. writes + local reads local reads Millionrequests/sec
  67. 67. Performance 80 Throughput high-perf. writes + local reads conc. writes + local reads local reads 4x 40% Millionrequests/sec
  68. 68. Performance 81 Throughput high-perf. writes + local reads conc. writes + local reads local reads 4x 40% Millionrequests/sec Write performance matters even at low write ratios
  69. 69. Performance 82 Throughput high-perf. writes + local reads conc. writes + local reads local reads 4x 40% 5% Write Ratio Write Latency (normalized to Hermes) Millionrequests/sec Write performance matters even at low write ratios
  70. 70. Performance 83 Throughput high-perf. writes + local reads conc. writes + local reads local reads 4x 40% 5% Write Ratio Write Latency (normalized to Hermes) Millionrequests/sec Write performance matters even at low write ratios 6x
  71. 71. Performance 84 Throughput high-perf. writes + local reads conc. writes + local reads local reads 4x 40% 5% Write Ratio Write Latency (normalized to Hermes) Millionrequests/sec Write performance matters even at low write ratios 6x Hermes: highest throughput & lowest latency
  72. 72. Hermes Broadcast + Invalidations + TS + early value propagation Hermes-protocol.com Code available TLA+ verification Q&A Conclusion 86
  73. 73. Hermes Broadcast + Invalidations + TS + early value propagation Strong consistency Fault tolerance via write replays High performance Local reads from all replicas High performance writes Fast Decentralized Fully concurrent Hermes-protocol.com Code available TLA+ verification Q&A Conclusion 87
  74. 74. Hermes Broadcast + Invalidations + TS + early value propagation Strong consistency Fault tolerance via write replays High performance Local reads from all replicas High performance writes Fast Decentralized Fully concurrent Hermes-protocol.com Code available TLA+ verification Q&A Conclusion 88
  75. 75. Hermes Broadcast + Invalidations + TS + early value propagation Strong consistency Fault tolerance via write replays High performance Local reads from all replicas High performance writes Fast Decentralized Fully concurrent Hermes-protocol.com Code available TLA+ verification Q&A Conclusion 89 Need reliability and performance? Choose Hermes!

×