Hermes Reliable Replication Protocol - ASPLOS'20 Presentation

Hermes
A Fast, Fault-tolerant and Linearizable
Replication Protocol
Antonios Katsarakis, V. Gavrielatos, S. Katebzadeh,
A. Joshi*, B. Grot, V. Nagarajan, A. Dragojevic†
University of Edinburgh, *Intel, †Microsoft Research
hermes-protocol.com
Thanks to:

In-memory with read/write API
Backbone of online services
Need:
High performance
Fault tolerance
Distributed datastores
2
Distributed Datastore

Need:
High performance
Fault tolerance
3

Need:
High performance
Fault tolerance
4

Need:
High performance
Fault tolerance
5

Need:
High performance
Fault tolerance
6

Need:
High performance
Fault tolerance
7
Mandates data replication

Typically 3 to 7 replicas
Consistency
Weak: performance but nasty surprises
Strong: programmable and intuitive
Reliable replication protocols
• Strong consistency even under faults
• Define actions to execute reads & writes
à these determine a datastore’s performance
Replication 101
9
…… … …

Consistency
Replication 101
10
…… … …

Consistency
Replication 101
11
…… … …
Reliable Replication Protocol

Consistency
Replication 101
12
…… … …

Consistency
Replication 101
13
Can reliable protocols provide high performance?
…… … …

Golden standard
strong consistency and fault tolerance
Low performance
reads à inter-replica communication
writes à multiple RTTs over the network
Common-case performance (i.e., no faults)
as bad as worst-case (under faults)
15
Paxos

Golden standard
Low performance
16
Paxos

Golden standard
Low performance
17
Paxos

Golden standard
Low performance
18
Paxos
State-of-the-art reliable protocols exploit
failure-free operation for performance

20
Performance of state-of-the-art protocols
Leader
ZAB
replicas

21
Leader
ZAB
writeread bcastucast
Local reads form all replicas à Fast

22
Leader
ZAB
Leader
Writes serialize on the leader
à Low throughput

23
Leader
ZAB
Leader
à Low throughput
Head Tail
CRAQ

24
Leader
ZAB
Leader
à Low throughput
Head Tail
CRAQ
Local reads form all replicas à Fast Local reads form all replicas à Fast

25
Leader
ZAB
Leader
à Low throughput
Head Tail
CRAQ
Head Tail
Writes traverse length of the chain
à High latency

26
Leader
ZAB
Leader
à Low throughput
Head Tail
CRAQ
Head Tail
Writes traverse length of the chain
à High latency
Fast reads but poor write performance

28
Goal: low-latency + high-throughput
Reads
Local from all replicas
Writes
Fast
- Minimize network hops
Decentralized
- No serialization points
Fully concurrent
- Any replica can service a write
Key protocol features for high performance

29
Reads
Writes
Fast
Decentralized
Fully concurrent
Local reads from all replicas

30
Reads
Writes
Fast
Decentralized
Fully concurrent
Head Tail
Avoid long latencies

32
Reads
Writes
Fast
Decentralized
Fully concurrent
Leader
Avoid write serialization

33
Reads
Writes
Fast
Decentralized
Fully concurrent
Fast, decentralized, fully concurrent writes

34
Reads
Writes
Fast
Decentralized
Fully concurrent
Fast, decentralized, fully concurrent writes
Existing replication protocols are deficient

Broadcast-based, invalidating replication protocol
Inspired by multiprocessor cache-coherence protocols
Fault-free operation:
1. Coordinator broadcasts Invalidations
- Coordinator is a replica servicing a write
Enter Hermes
36

Enter Hermes
37
write(A=3)
Coordinator Followers

Enter Hermes
38
States of A: Valid, Invalid
write(A=3)
I
Invalidation
I

Enter Hermes
39
write(A=3)
At this point, no stale reads can be served
Strong consistency!
I
Invalidation
I

2. Followers Acknowledge invalidation
3. Coordinator broadcasts Validations
- All replicas can now serve reads for this object
Strongest consistency Linearizability
à valid objects = latest value
Enter Hermes
41
write(A=3)
Ack
Ack
I
Invalidation
I

Enter Hermes
42
write(A=3)
Ack
Ack
I
Invalidation
I
Vcommit

Enter Hermes
43
write(A=3)
V
Validation
V
Ack
Ack
I
Invalidation
I
V

Enter Hermes
44
write(A=3)
V
Validation
V
Ack
Ack
I
Invalidation
I
V

Enter Hermes
45
write(A=3)
What about concurrent writes?
V
Validation
V
Ack
Ack
I
Invalidation
I
V

Challenge
How to efficiently order concurrent writes to an object?
Solution
Store a logical timestamp (TS) along with each object
- Upon a write:
coordinator increments TS and sends it with Invalidations
- Upon receiving Invalidation:
a follower updates the object’s TS
- When two writes to the same object race:
use node ID to order them
Concurrent writes = challenge
47
write(A=3) write(A=1)

Challenge
Solution
- Upon a write:
48

Challenge
Solution
- Upon a write:
49
Inv(TS1) Inv(TS4)

Challenge
Solution
- Upon a write:
50
Inv(TS1) Inv(TS4)

Challenge
Solution
- Upon a write:
51
Inv(TS1) Inv(TS4)

Challenge
Solution
- Upon a write:
52
Inv(TS1) Inv(TS4)
Broadcast + Invalidations + TS à high performance writes

1. Decentralized
Fully distributed write ordering at endpoints
2. Fully concurrent
Any replica can coordinate a write
Writes to different objects proceed in parallel
3. Fast
Writes commit in 1 RTT
Writes never abort
Writes in Hermes
54
Broadcast + Invalidations + TS

1. Decentralized
2. Fully concurrent
3. Fast
Writes never abort
Writes in Hermes
55

1. Decentralized
2. Fully concurrent
3. Fast
Writes never abort
Writes in Hermes
56

1. Decentralized
2. Fully concurrent
3. Fast
Writes never abort
Writes in Hermes
57

1. Decentralized
2. Fully concurrent
3. Fast
Writes never abort
Writes in Hermes
58
Awesome! But what about fault tolerance?

Problem
A failure in the middle of a write can
permanently leave a replica in Invalid state
Solution: send write value with Invalidation à Early value propagation
60
Handling faults in Hermes

Problem
write(A=3)
61

Problem
write(A=3)
62
Inv(TS)
I
I

Problem
write(A=3)
63
Inv(TS)
Coordinator
fails
I
I

Problem
write(A=3)
64
read(A)
Inv(TS)
Coordinator
fails
I
I

Problem
write(A=3)
65
read(A)
Inv(TS)
Coordinator
fails
I
I

Problem
Idea
Allow any Invalidated replica to
replay the write and unblock.
write(A=3)
66
read(A)
Inv(TS)
Coordinator
fails
I
I

Problem
Idea
How?
Insight: to replay a write need
- Write’s original TS (for ordering)
- Write value
write(A=3)
67
read(A)
Inv(TS)
Coordinator
fails
I
I

Problem
Idea
How?
- Write value
TS sent with Invalidation, but write value is not
write(A=3)
68
read(A)
Inv(TS)
Coordinator
fails
I
I

Problem
Idea
How?
- Write value
70
Inv(3,TS)write(A=3)
Coordinator
fails
I
I

Problem
Idea
How?
- Write value
71
Inv(3,TS)write(A=3)
read(A)
Coordinator
fails
I
I

Problem
Idea
How?
- Write value
V
V
Inv(3,TS)
completion
write
replay
read(A)
73
Inv(3,TS)write(A=3)
Coordinator
fails
I
I

Problem
Idea
How?
- Write value
V
V
Inv(3,TS)
completion
write
replay
read(A)
74
Inv(3,TS)write(A=3)
Early value propagation enables write replays
Coordinator
fails
I
I

Strong Consistency
through CC-inspired Invalidations
Fault-tolerance
write replays via early value propagation
High Performance
Local reads at all replicas
High performance writes
Fast
Decentralized
Fully-distributed
Hermes recap
76
V
I
write(A=3)
commit
Inv(3,TS)
V
I
V
Broadcast + Invalidations + TS + early value propagation

Strong Consistency
through CC-inspired Invalidations
Fault-tolerance
write replays via early value propagation
High Performance
Local reads at all replicas
Fast
Decentralized
Fully-distributed
Hermes recap
77
V
I
write(A=3)
commit
Inv(3,TS)
V
I
V
In the paper: protocol details, RMWs, other goodies

Evaluation
78
State-of-the-art hardware testbed
- 5 servers
- 2x 10 core Intel Xeon E5-2630v4 per server
- 56 Gb/s InfiniBand NICs
KVS Workload
- Uniform access distribution
- Million KV pairs: <8B keys, 32B values>
Evaluated protocols:
- ZAB
- CRAQ
- Hermes

Performance
79
Throughput
high-perf. writes + local reads
conc. writes + local reads
local reads
Millionrequests/sec

Performance
80
Throughput
local reads
4x
40%
Millionrequests/sec

Performance
81
Throughput
local reads
4x
40%
Millionrequests/sec
Write performance matters even at low write ratios

Performance
82
Throughput
local reads
4x
40%
5% Write Ratio
Write Latency
(normalized to Hermes)
Millionrequests/sec

Performance
83
Throughput
local reads
4x
40%
5% Write Ratio
Write Latency
Millionrequests/sec
6x

Performance
84
Throughput
local reads
4x
40%
5% Write Ratio
Write Latency
Millionrequests/sec
6x
Hermes: highest throughput & lowest latency

Hermes
Hermes-protocol.com
Code available
TLA+ verification
Q&A
Conclusion
86

Hermes
Strong consistency
Fault tolerance via write replays
High performance
Fast
Decentralized
Fully concurrent
Hermes-protocol.com
Code available
TLA+ verification
Q&A
Conclusion
87

Hermes
Strong consistency
High performance
Fast
Decentralized
Fully concurrent
Hermes-protocol.com
Code available
TLA+ verification
Q&A
Conclusion
88

Hermes
Strong consistency
High performance
Fast
Decentralized
Fully concurrent
Hermes-protocol.com
Code available
TLA+ verification
Q&A
Conclusion
89
Need reliability and performance? Choose Hermes!

Hermes Reliable Replication Protocol - ASPLOS'20 Presentation

More Related Content

What's hot

Similar to Hermes Reliable Replication Protocol - ASPLOS'20 Presentation

More from Antonios Katsarakis

Recently uploaded

Hermes Reliable Replication Protocol - ASPLOS'20 Presentation