1. Hermes Protocol OverviewMotivation
Results
Hermes: A Fast, Fault-tolerant and Linearizable Replication Protocol
A. Katsarakis, V. Gavrielatos, S. Katebzadeh, A. Joshi*, A. Dragojevic†, B. Grot, V. Nagarajan
University of Edinburgh, *Intel, †Microsoft Research
hermes-protocol.com
State-of-art write performance Hermes
State-of-the-Art Protocols
Exploit failure-free operation for performance
• Local reads from all replicas
• Poor write throughput and latency
Writes can block local reads
hurting performance even at low write ratios
Linearizability
Reads are served locally when key is Valid
Writes commit after invalidating all replicas of a key
Fault tolerance
Any replica after a fault can replay writes to unblock
5 node (replicas), 56 Gbit RDMA NICs, 1M keys uniformly accessed
Linearizability & Fault-tolerance with High-Performance
Throughput
high-perf. writes + local reads
conc. writes + local reads
local reads
Millionrequests/sec
4χ
40%
@ 5% write ratio
Write Latency
(normalized to Hermes)
% write ratio
6x
completion
V
V
I
write(A=3)
Invalidation
(3,TS)
Validation
Ack
Ack
V
I
States of A: Valid or Invalid
Writes to flow concurrently in the chain
Must traverse the length of chain = slow
Reduces an RTT from traditional Paxos
All writes serialize on leader = low concurrency
Leader
ZAB (Multi-Paxos)
Head Tail
CRAQ (Chain Replication)
Broadcast-based, invalidating reliable protocol
inspired by multiprocessor’s cache-coherence
• Fast local reads from all replicas.
• High performance writes
Fast (1 RTT)
Decentralized
Fully concurrent
Need never abort
Distributed Datastores
• Read/write API
• Backbone of modern online services
Reliable Replication Protocols
• Keep replicas strongly consistent despite faults
• Define actions to execute reads and writes
determines datastore’s performance
replicas to keep consistent
Local ReadWrite Unicast Mcast to
Replicas
Available
Data replication for fault tolerance
Consistent
Programability strongly consistent replicas
Performant
Exploit replicas for low-latency & high-throughput
Logical
Timestamp
Broadcast + Invalidations + early value propagation + TS