SlideShare a Scribd company logo
Zeus
Locality-aware distributed transactions
A. Katsarakis*
, Y. Ma†
, Z. Tan§
, A. Bainbridge, M. Balkwill,
A. Dragojevic, B. Grot*, B. Radunovic, Y. Zhang
*University of Edinburgh, †Fudan University, §UCLA, Microsoft Research
zeus-protocol.com
Thanks to:
Keep data
in-memory, replicated, sharded
across nodes of a datacenter
Backbone of transactional cloud applications
Demand
distributed reliable transactions (txs)
strongly-consistent and fault-tolerant
high performance
Modern distributed datastores
2
distributed datastore
Keep data
in-memory, replicated, sharded
across nodes of a datacenter
Backbone of transactional cloud applications
Demand
distributed reliable transactions (txs)
strongly-consistent and fault-tolerant
high performance
Modern distributed datastores
3
distributed datastore
Keep data
in-memory, replicated, sharded
across nodes of a datacenter
Backbone of transactional cloud applications
Demand
distributed reliable transactions (txs)
strongly-consistent and fault-tolerant
high performance
Modern distributed datastores
4
distributed datastore
Keep data
in-memory, replicated, sharded
across nodes of a datacenter
Backbone of transactional cloud applications
Demand
distributed reliable transactions (txs)
strongly-consistent and fault-tolerant
high performance
Modern distributed datastores
5
distributed datastore
Keep data
in-memory, replicated, sharded
across nodes of a datacenter
Backbone of transactional cloud applications
Demand
distributed reliable transactions (txs)
strongly-consistent and fault-tolerant
high performance
Modern distributed datastores
6
distributed datastore
Traditional distributed txs well-known as expensive
Many tx applications exhibit dynamic locality
network functions, peer-to-peer payments …
Example: cellular control plane
manages phone connectivity and
handovers among base stations
Locality
every phone user repeats txs:
same phone & nearest base-station
But locality is dynamic
changes at run-time
e.g., user commutes à base-station changes
Observation
7
base-station A
base-station B
Many tx applications exhibit dynamic locality
network functions, peer-to-peer payments …
Example: cellular control plane
manages phone connectivity and
handovers among base stations
Locality
every phone user repeats txs:
same phone & nearest base-station
But locality is dynamic
changes at run-time
e.g., user commutes à base-station changes
Observation
8
Many tx applications exhibit dynamic locality
network functions, peer-to-peer payments …
Example: cellular control plane
manages phone connectivity and
handovers among base stations
Locality
every phone user repeats txs:
same phone & nearest base-station
But locality is dynamic
changes at run-time
e.g., user commutes à base-station changes
Observation
9
base-station A
base-station B
base-station A
base-station B
Many tx applications exhibit dynamic locality
network functions, peer-to-peer payments …
Example: cellular control plane
manages phone connectivity and
handovers among base stations
Locality
every phone user repeats txs:
same phone & nearest base-station
But locality is dynamic
changes at run-time
e.g., user commutes à base-station changes
Observation
10
handover
base-station A
base-station B
Many tx applications exhibit dynamic locality
network functions, peer-to-peer payments …
Example: cellular control plane
manages phone connectivity and
handovers among base stations
Locality
every phone user repeats txs:
same phone & nearest base-station
But locality is dynamic
changes at run-time
e.g., user commutes à base-station changes
Observation
11
handover
Can state-of-the-art datastores exploit dynamic locality?
State-of-the-art reliable datastores
12
Static sharding (e.g., consistent hashing)
Objects placed randomly on fixed nodes
easy to locate and access objects
reliable txs regardless of access pattern
expensive reliable txs
mostly remote accesses
some blocking (control flow, pοinter chasing)
related objects on different shards
costly distributed commit
tx coordinator
State-of-the-art reliable datastores
13
Static sharding (e.g., consistent hashing)
Objects placed randomly on fixed nodes
easy to locate and access objects
reliable txs regardless of access pattern
expensive reliable txs
mostly remote accesses
some blocking (control flow, pοinter chasing)
related objects on different shards
costly distributed commit
tx coordinator
distributed commit
1. tx: if (p) b++;
remote accesses
Adapted from FaSST [OSDI’16]
Adapted from FaSST [OSDI’16]
distributed commit
1. tx: if (p) b++;
remote accesses
Adapted from FaSST [OSDI’16]
State-of-the-art reliable datastores
14
Static sharding (e.g., consistent hashing)
Objects placed randomly on fixed nodes
easy to locate and access objects
reliable txs regardless of access pattern
expensive reliable txs
mostly remote accesses
some blocking (control flow, pοinter chasing)
related objects on different shards
costly distributed commit
tx coordinator
distributed commit
1. tx: if (p) b++;
remote accesses
Adapted from FaSST [OSDI’16]
Adapted from FaSST [OSDI’16]
distributed commit
1. tx: if (p) b++;
remote accesses
Adapted from FaSST [OSDI’16]
State-of-the-art reliable datastores
15
Static sharding (e.g., consistent hashing)
Objects placed randomly on fixed nodes
easy to locate and access objects
reliable txs regardless of access pattern
expensive reliable txs
mostly remote accesses
some blocking (control flow, pοinter chasing)
related objects on different shards
costly distributed commit
tx coordinator
distributed commit
1. tx: if (p) b++;
remote accesses
Adapted from FaSST [OSDI’16]
Adapted from FaSST [OSDI’16]
Cannot exploit locality à expensive reliable txs
Enter Zeus
16
Distributed datastore: exploits locality for fast reliable txs
Inspired by multiprocessor’s hardware transactional memory
Basic idea
Each object has a single node owner = data + exclusive write access
the owner changes dynamically and is tracked by replicated directory
Coordinator executes a tx by acquiring ownership of all its objects
à single-node commit
Ownership stays with coordinator
à future txs on these objects enjoy local accesses
Enter Zeus
17
Distributed datastore: exploits locality for fast reliable txs
Inspired by multiprocessor’s hardware transactional memory
Basic idea
Each object has a single node owner = data + exclusive write access
the owner changes dynamically and is tracked by replicated directory
Coordinator executes a tx by acquiring ownership of all its objects
à single-node commit
Ownership stays with coordinator
à future txs on these objects enjoy local accesses
Enter Zeus
18
Distributed datastore: exploits locality for fast reliable txs
Inspired by multiprocessor’s hardware transactional memory
Basic idea
Each object has a single node owner = data + exclusive write access
the owner changes dynamically and is tracked by replicated directory
Coordinator executes a tx by acquiring ownership of all its objects
à single-node commit
Ownership stays with coordinator
à future txs on these objects enjoy local accesses
Enter Zeus
19
Distributed datastore: exploits locality for fast reliable txs
Inspired by multiprocessor’s hardware transactional memory
Basic idea
Each object has a single node owner = data + exclusive write access
the owner changes dynamically and is tracked by replicated directory
Coordinator executes a tx by acquiring ownership of all its objects
à single-node commit
Ownership stays with coordinator
à future txs on these objects enjoy local accesses
Enter Zeus
20
Distributed datastore: exploits locality for fast reliable txs
Inspired by multiprocessor’s hardware transactional memory
Basic idea
Each object has a single node owner = data + exclusive write access
the owner changes dynamically and is tracked by replicated directory
Coordinator executes a tx by acquiring ownership of all its objects
à single-node commit
Ownership stays with coordinator
à future txs on these objects enjoy local accesses
What are the exact steps?
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
(updates not yet replicated)
3. Reliable commit
completes tx: updating replicas for availability
1.
Locality-aware txs in Zeus
21
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
(updates not yet replicated)
3. Reliable commit
completes tx: updating replicas for availability
1.
Locality-aware txs in Zeus
22
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
(updates not yet replicated)
3. Reliable commit
completes tx: updating replicas for availability
1.
Locality-aware txs in Zeus
23
How to get ownership reliably?
Ownership protocol
24
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
Ownership protocol
25
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
arbiters
Ownership protocol
26
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
arbiters
Ownership protocol
27
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
arbiters
Ownership protocol
28
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
arbiters
Ownership is acquired & coordinator proceeds with tx
Ownership protocol
29
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
arbiters
Ownership protocol
30
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
arbiters
Ownership protocol
31
1) Coordinator gets ownership from current owner
2) Keeps consistent directory replicas
1. Coordinator sends object ownership invalidations
(through a directory replica) to all arbiters
2. Arbiters acknowledge the coordinator directly
3. Coordinator sends validations informing arbiters for acquisition
Conflicts: logical timestamps, fault tolerance: idempotent replays
as in Hermes [ASPLOS’20]
Correctness verified under conflicts and faults
arbiters
32
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
3. Reliable commit
completes tx: updating replicas for availability
Locality + ownership stays with coordinator
Locality-aware txs in Zeus
33
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
3. Reliable commit
completes tx: updating replicas for availability
common
case
Locality + ownership stays with coordinator
Locality-aware txs in Zeus
34
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
3. Reliable commit
completes tx: updating replicas for availability
common
case
Locality-aware txs in Zeus
35
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
3. Reliable commit
completes tx: updating replicas for availability
common
case
Locality-aware txs in Zeus
36
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
3. Reliable commit
completes tx: updating replicas for availability
common
case
Locality-aware txs in Zeus
37
1. Execute as the owner
a) at object access: if (not owner) get ownership
b) local access
2. Local commit
commits tx: traditional single-node commit
3. Reliable commit
completes tx: updating replicas for availability
common
case
Locality-aware txs in Zeus
Great! But how efficient is reliable commit?
Reliable commit
38
1. Committed tx à no conflicts à fast tx completion
- coordinator sends updates to replicas and waits for ACKs
- read-only txs: no updates à no reliable commit
2. No conflicts à no aborts à pipelined txs (no waiting for replication)
- subsequent txs use local state with certainty & issue updates
- coordinator sequences updates, which replicas apply in order
Fault tolerance: idempotent replays
Reliable commit
39
1. Committed tx à no conflicts à fast tx completion
- coordinator sends updates to replicas and waits for ACKs
- read-only txs: no updates à no reliable commit
2. No conflicts à no aborts à pipelined txs (no waiting for replication)
- subsequent txs use local state with certainty & issue updates
- coordinator sequences updates, which replicas apply in order
Fault tolerance: idempotent replays
Reliable commit
40
1. Committed tx à no conflicts à fast tx completion
- coordinator sends updates to replicas and waits for ACKs
- read-only txs: no updates à no reliable commit
2. No conflicts à no aborts à pipelined txs (no waiting for replication)
- subsequent txs use local state with certainty & issue updates
- coordinator sequences updates, which replicas apply in order
Fault tolerance: idempotent replays
Reliable commit
41
1. Committed tx à no conflicts à fast tx completion
- coordinator sends updates to replicas and waits for ACKs
- read-only txs: no updates à no reliable commit
2. No conflicts à no aborts à pipelined txs (no waiting for replication)
- subsequent txs use local state with certainty & issue updates
- coordinator sequences updates, which replicas apply in order
Fault tolerance: idempotent replays
Very efficient! Correctness verified under faults
Recap: txs in Zeus
42
Locality-aware, distributed and reliable
Recap: txs in Zeus
43
Locality-aware, distributed and reliable
Recap: txs in Zeus
44
Locality-aware, distributed and reliable
Recap: txs in Zeus
45
Locality-aware, distributed and reliable
Recap: txs in Zeus
46
Locality-aware, distributed and reliable
Recap: txs in Zeus
47
Locality-aware, distributed and reliable
Awesome! Does it translate into performance?
Performance
48
6 nodes, 3-way replication,
Zeus 40Gb (no RDMA)
Performance
49
Million
txs
/
sec
Ideal
(all local)
Zeus
(real-world locality)
Handovers
6 nodes, 3-way replication,
Zeus 40Gb (no RDMA)
Performance
50
Zeus: within 9% of ideal
Million
txs
/
sec
Ideal
(all local)
Zeus
(real-world locality)
Handovers
6 nodes, 3-way replication,
Zeus 40Gb (no RDMA)
9%
Performance
51
Zeus: within 9% of ideal
Million
txs
/
sec
Ideal
(all local)
Zeus
(real-world locality)
Handovers
6 nodes, 3-way replication,
Zeus 40Gb (no RDMA)
% write txs needing ownership
TATP
9%
Performance
52
Zeus: within 9% of ideal
Million
txs
/
sec
Ideal
(all local)
Zeus
(real-world locality)
Handovers
6 nodes, 3-way replication,
Zeus 40Gb (no RDMA)
% write txs needing ownership
TATP
2x
9%
Up to 40M.tx/s and 2x state-of-the-art
FaSST [OSDI’16], FaRM [SOSP’15]
which use 56Gb RDMA
Performance
53
Zeus: within 9% of ideal
Million
txs
/
sec
Ideal
(all local)
Zeus
(real-world locality)
Handovers
6 nodes, 3-way replication,
Zeus 40Gb (no RDMA)
% write txs needing ownership
TATP
2x
9%
Up to 40M.tx/s and 2x state-of-the-art
FaSST [OSDI’16], FaRM [SOSP’15]
which use 56Gb RDMA
Paper: more benchmarks, ownership, latency ...
State-of-the-art reliable txs over static sharding:
cannot exploit dynamic locality
remote accesses
costly distributed commit
Zeus’ reliable txs exploit locality via dynamic ownership:
local accesses in the common case
single-node commit
- local for read-only txs
- fast and pipelined for write txs
Performance 10s millions txs/second
up to 2x state-of-the-art
Bonus: programmability!
Conclusion
54
State-of-the-art reliable txs over static sharding:
cannot exploit dynamic locality
remote accesses
costly distributed commit
Zeus’ reliable txs exploit locality via dynamic ownership:
local accesses in the common case
single-node commit
- local for read-only txs
- fast and pipelined for write txs
Performance 10s millions txs/second
up to 2x state-of-the-art
Bonus: programmability!
Conclusion
55
zeus-protocol.com
TLA+ specification, Q&A …
State-of-the-art reliable txs over static sharding:
cannot exploit dynamic locality
remote accesses
costly distributed commit
Zeus’ reliable txs exploit locality via dynamic ownership:
local accesses in the common case
single-node commit
- local for read-only txs
- fast and pipelined for write txs
Performance 10s millions txs/second
up to 2x state-of-the-art
Bonus: programmability!
Conclusion
56
zeus-protocol.com
TLA+ specification, Q&A …
State-of-the-art reliable txs over static sharding:
cannot exploit dynamic locality
remote accesses
costly distributed commit
Zeus’ reliable txs exploit locality via dynamic ownership:
local accesses in the common case
single-node commit
- local for read-only txs
- fast and pipelined for write txs
Performance 10s millions txs/second
up to 2x state-of-the-art
Bonus: programmability!
Conclusion
57
zeus-protocol.com
TLA+ specification, Q&A …
State-of-the-art reliable txs over static sharding:
cannot exploit dynamic locality
remote accesses
costly distributed commit
Zeus’ reliable txs exploit locality via dynamic ownership:
local accesses in the common case
single-node commit
- local for read-only txs
- fast and pipelined for write txs
Performance 10s millions txs/second
up to 2x state-of-the-art
Bonus: programmability!
Conclusion
58
zeus-protocol.com
TLA+ specification, Q&A …
Reliable txs with locality? Use Zeus!

More Related Content

What's hot

Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
AbDul ThaYyal
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
Romain Jacotin
 
Database replication
Database replicationDatabase replication
Database replication
Arslan111
 
GFS
GFSGFS
Cache Coherence.pptx
Cache Coherence.pptxCache Coherence.pptx
Cache Coherence.pptx
SamyakJain710491
 
Sensor node hardware and network architecture
Sensor node hardware and network architectureSensor node hardware and network architecture
Sensor node hardware and network architecture
Vidhi603146
 
Google File System
Google File SystemGoogle File System
Google File System
guest2cb4689
 
Google File System
Google File SystemGoogle File System
Google File System
nadikari123
 
Chapter 14
Chapter 14Chapter 14
Chapter 14
Ali Broumandnia
 
Models of Distributed System
Models of Distributed SystemModels of Distributed System
Models of Distributed System
Ashish KC
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
Dr Neelesh Jain
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
Sonia Panesar
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systems
Romain Jacotin
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
AAKANKSHA JAIN
 
Linux fundamentals
Linux fundamentalsLinux fundamentals
Linux fundamentals
Deepak Upadhyay
 
Mobile agents
Mobile agentsMobile agents
Mobile agents
Santosh Pandey
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
Arush Nagpal
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
udaya khanal
 
05 (1).pptx
05 (1).pptx05 (1).pptx
IT6601 MOBILE COMPUTING
IT6601 MOBILE COMPUTINGIT6601 MOBILE COMPUTING
IT6601 MOBILE COMPUTING
Kathirvel Ayyaswamy
 

What's hot (20)

Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Database replication
Database replicationDatabase replication
Database replication
 
GFS
GFSGFS
GFS
 
Cache Coherence.pptx
Cache Coherence.pptxCache Coherence.pptx
Cache Coherence.pptx
 
Sensor node hardware and network architecture
Sensor node hardware and network architectureSensor node hardware and network architecture
Sensor node hardware and network architecture
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google File System
Google File SystemGoogle File System
Google File System
 
Chapter 14
Chapter 14Chapter 14
Chapter 14
 
Models of Distributed System
Models of Distributed SystemModels of Distributed System
Models of Distributed System
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systems
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
 
Linux fundamentals
Linux fundamentalsLinux fundamentals
Linux fundamentals
 
Mobile agents
Mobile agentsMobile agents
Mobile agents
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
05 (1).pptx
05 (1).pptx05 (1).pptx
05 (1).pptx
 
IT6601 MOBILE COMPUTING
IT6601 MOBILE COMPUTINGIT6601 MOBILE COMPUTING
IT6601 MOBILE COMPUTING
 

Similar to Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]

tezos_hands-on-training.pdf
tezos_hands-on-training.pdftezos_hands-on-training.pdf
tezos_hands-on-training.pdf
Neven6
 
Blockene
BlockeneBlockene
Blockene
YongraeJo
 
Designing Quality-Driven Blockchain Networks
Designing Quality-Driven Blockchain NetworksDesigning Quality-Driven Blockchain Networks
Designing Quality-Driven Blockchain Networks
CREST
 
Zeus Locality aware distributed transaction upload.pptx
Zeus Locality aware distributed transaction upload.pptxZeus Locality aware distributed transaction upload.pptx
Zeus Locality aware distributed transaction upload.pptx
YongraeJo
 
Hashgraph as Code
Hashgraph as CodeHashgraph as Code
Hashgraph as Code
Calvin Cheng
 
Blockchain 101
Blockchain 101Blockchain 101
Blockchain 101
Paul O'Hagan
 
Distributed dataintelligence
Distributed dataintelligenceDistributed dataintelligence
Distributed dataintelligence
www.ixxo.io
 
Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...
Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...
Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...
Dataconomy Media
 
Using data to evaluate the health of cryptonetworks
Using data to evaluate the health of cryptonetworksUsing data to evaluate the health of cryptonetworks
Using data to evaluate the health of cryptonetworks
Elias Simos
 
Blockchain - Things you need to know
Blockchain - Things you need to knowBlockchain - Things you need to know
Blockchain - Things you need to know
NAAPBOOKS
 
BigchainDB: A Scalable Blockchain Database, In Python
  BigchainDB: A Scalable Blockchain Database, In Python   BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB
 
BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In PythonBigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In Python
Trent McConaghy
 
Trent McConaghy- BigchainDB
Trent McConaghy- BigchainDBTrent McConaghy- BigchainDB
Trent McConaghy- BigchainDB
PyData
 
Connecting The Block Cointelligence Academy by Dr Vince Ming
Connecting The Block   Cointelligence Academy by Dr Vince MingConnecting The Block   Cointelligence Academy by Dr Vince Ming
Connecting The Block Cointelligence Academy by Dr Vince Ming
Cointelligence
 
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
David Wood
 
Ethereum Devcon1 Report (summary writing)
Ethereum Devcon1 Report (summary writing)Ethereum Devcon1 Report (summary writing)
Ethereum Devcon1 Report (summary writing)
Tomoaki Sato
 
Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703
Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703
Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703
Nevruz Mesut Sahin
 
Deploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & Code
Deploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & CodeDeploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & Code
Deploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & Code
Horea Porutiu
 
Understanding Blockchain
Understanding BlockchainUnderstanding Blockchain
Understanding Blockchain
Tony Willenberg
 
The Basic Theories of Blockchain
The Basic Theories of BlockchainThe Basic Theories of Blockchain
The Basic Theories of Blockchain
Sota Watanabe
 

Similar to Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation] (20)

tezos_hands-on-training.pdf
tezos_hands-on-training.pdftezos_hands-on-training.pdf
tezos_hands-on-training.pdf
 
Blockene
BlockeneBlockene
Blockene
 
Designing Quality-Driven Blockchain Networks
Designing Quality-Driven Blockchain NetworksDesigning Quality-Driven Blockchain Networks
Designing Quality-Driven Blockchain Networks
 
Zeus Locality aware distributed transaction upload.pptx
Zeus Locality aware distributed transaction upload.pptxZeus Locality aware distributed transaction upload.pptx
Zeus Locality aware distributed transaction upload.pptx
 
Hashgraph as Code
Hashgraph as CodeHashgraph as Code
Hashgraph as Code
 
Blockchain 101
Blockchain 101Blockchain 101
Blockchain 101
 
Distributed dataintelligence
Distributed dataintelligenceDistributed dataintelligence
Distributed dataintelligence
 
Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...
Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...
Big Data London 2019 v.10 I 'Using data to evaluate the health of crypto netw...
 
Using data to evaluate the health of cryptonetworks
Using data to evaluate the health of cryptonetworksUsing data to evaluate the health of cryptonetworks
Using data to evaluate the health of cryptonetworks
 
Blockchain - Things you need to know
Blockchain - Things you need to knowBlockchain - Things you need to know
Blockchain - Things you need to know
 
BigchainDB: A Scalable Blockchain Database, In Python
  BigchainDB: A Scalable Blockchain Database, In Python   BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In Python
 
BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In PythonBigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In Python
 
Trent McConaghy- BigchainDB
Trent McConaghy- BigchainDBTrent McConaghy- BigchainDB
Trent McConaghy- BigchainDB
 
Connecting The Block Cointelligence Academy by Dr Vince Ming
Connecting The Block   Cointelligence Academy by Dr Vince MingConnecting The Block   Cointelligence Academy by Dr Vince Ming
Connecting The Block Cointelligence Academy by Dr Vince Ming
 
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
 
Ethereum Devcon1 Report (summary writing)
Ethereum Devcon1 Report (summary writing)Ethereum Devcon1 Report (summary writing)
Ethereum Devcon1 Report (summary writing)
 
Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703
Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703
Deployablockchainwebappwithhyperledgerfabricpresentation 190820170703
 
Deploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & Code
Deploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & CodeDeploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & Code
Deploy a blockchain web-app with Hyperledger Fabric 1.4 - Concepts & Code
 
Understanding Blockchain
Understanding BlockchainUnderstanding Blockchain
Understanding Blockchain
 
The Basic Theories of Blockchain
The Basic Theories of BlockchainThe Basic Theories of Blockchain
The Basic Theories of Blockchain
 

More from Antonios Katsarakis

Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...
Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...
Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...
Antonios Katsarakis
 
Dandelion Hashtable: beyond billion requests per second on a commodity server...
Dandelion Hashtable: beyond billion requests per second on a commodity server...Dandelion Hashtable: beyond billion requests per second on a commodity server...
Dandelion Hashtable: beyond billion requests per second on a commodity server...
Antonios Katsarakis
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
The L2AW theorem
The L2AW theoremThe L2AW theorem
The L2AW theorem
Antonios Katsarakis
 
Invalidation-Based Protocols for Replicated Datastores
Invalidation-Based Protocols for Replicated DatastoresInvalidation-Based Protocols for Replicated Datastores
Invalidation-Based Protocols for Replicated Datastores
Antonios Katsarakis
 
Scale-out ccNUMA - Eurosys'18
Scale-out ccNUMA - Eurosys'18Scale-out ccNUMA - Eurosys'18
Scale-out ccNUMA - Eurosys'18
Antonios Katsarakis
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
Antonios Katsarakis
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing Frameworks
Antonios Katsarakis
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
Antonios Katsarakis
 

More from Antonios Katsarakis (9)

Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...
Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...
Dandelion: Hundreds of Millions of Distributed Replicated Transactions with F...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server...
Dandelion Hashtable: beyond billion requests per second on a commodity server...Dandelion Hashtable: beyond billion requests per second on a commodity server...
Dandelion Hashtable: beyond billion requests per second on a commodity server...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
The L2AW theorem
The L2AW theoremThe L2AW theorem
The L2AW theorem
 
Invalidation-Based Protocols for Replicated Datastores
Invalidation-Based Protocols for Replicated DatastoresInvalidation-Based Protocols for Replicated Datastores
Invalidation-Based Protocols for Replicated Datastores
 
Scale-out ccNUMA - Eurosys'18
Scale-out ccNUMA - Eurosys'18Scale-out ccNUMA - Eurosys'18
Scale-out ccNUMA - Eurosys'18
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing Frameworks
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 

Recently uploaded

SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdfSEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
Balti Bloggers
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
jealousviolet
 
Crafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data PlatformsCrafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data Platforms
Sameer Paradkar
 
SAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple SoftwareSAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple Software
Zyple Software
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
Srinivas Dukka
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
neshakor5152
 
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
3610stuck
 
How To Fill Timesheet in TaskSprint: Quick Guide 2024
How To Fill Timesheet in TaskSprint: Quick Guide 2024How To Fill Timesheet in TaskSprint: Quick Guide 2024
How To Fill Timesheet in TaskSprint: Quick Guide 2024
TaskSprint | Employee Efficiency Software
 
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
OnePlan Solutions
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Tour and travel website management in odoo,
Tour and travel website management in odoo,Tour and travel website management in odoo,
Tour and travel website management in odoo,
Axis Technolabs
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
rachitkumar09887
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
6m9p7qnjj8
 
Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)
andrehoraa
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
AsifKarimJim
 
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
aslasdfmkhan4750
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech
 
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDSAmadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
aadhiyaeliza
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
norina2645
 
TEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with YouTEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with You
marcofolio
 

Recently uploaded (20)

SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdfSEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
 
Crafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data PlatformsCrafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data Platforms
 
SAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple SoftwareSAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple Software
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
 
How To Fill Timesheet in TaskSprint: Quick Guide 2024
How To Fill Timesheet in TaskSprint: Quick Guide 2024How To Fill Timesheet in TaskSprint: Quick Guide 2024
How To Fill Timesheet in TaskSprint: Quick Guide 2024
 
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
 
Tour and travel website management in odoo,
Tour and travel website management in odoo,Tour and travel website management in odoo,
Tour and travel website management in odoo,
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
 
Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
 
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
 
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDSAmadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
 
TEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with YouTEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with You
 

Zeus: Locality-aware Distributed Transactions [Eurosys '21 presentation]

  • 1. Zeus Locality-aware distributed transactions A. Katsarakis* , Y. Ma† , Z. Tan§ , A. Bainbridge, M. Balkwill, A. Dragojevic, B. Grot*, B. Radunovic, Y. Zhang *University of Edinburgh, †Fudan University, §UCLA, Microsoft Research zeus-protocol.com Thanks to:
  • 2. Keep data in-memory, replicated, sharded across nodes of a datacenter Backbone of transactional cloud applications Demand distributed reliable transactions (txs) strongly-consistent and fault-tolerant high performance Modern distributed datastores 2 distributed datastore
  • 3. Keep data in-memory, replicated, sharded across nodes of a datacenter Backbone of transactional cloud applications Demand distributed reliable transactions (txs) strongly-consistent and fault-tolerant high performance Modern distributed datastores 3 distributed datastore
  • 4. Keep data in-memory, replicated, sharded across nodes of a datacenter Backbone of transactional cloud applications Demand distributed reliable transactions (txs) strongly-consistent and fault-tolerant high performance Modern distributed datastores 4 distributed datastore
  • 5. Keep data in-memory, replicated, sharded across nodes of a datacenter Backbone of transactional cloud applications Demand distributed reliable transactions (txs) strongly-consistent and fault-tolerant high performance Modern distributed datastores 5 distributed datastore
  • 6. Keep data in-memory, replicated, sharded across nodes of a datacenter Backbone of transactional cloud applications Demand distributed reliable transactions (txs) strongly-consistent and fault-tolerant high performance Modern distributed datastores 6 distributed datastore Traditional distributed txs well-known as expensive
  • 7. Many tx applications exhibit dynamic locality network functions, peer-to-peer payments … Example: cellular control plane manages phone connectivity and handovers among base stations Locality every phone user repeats txs: same phone & nearest base-station But locality is dynamic changes at run-time e.g., user commutes à base-station changes Observation 7
  • 8. base-station A base-station B Many tx applications exhibit dynamic locality network functions, peer-to-peer payments … Example: cellular control plane manages phone connectivity and handovers among base stations Locality every phone user repeats txs: same phone & nearest base-station But locality is dynamic changes at run-time e.g., user commutes à base-station changes Observation 8
  • 9. Many tx applications exhibit dynamic locality network functions, peer-to-peer payments … Example: cellular control plane manages phone connectivity and handovers among base stations Locality every phone user repeats txs: same phone & nearest base-station But locality is dynamic changes at run-time e.g., user commutes à base-station changes Observation 9 base-station A base-station B
  • 10. base-station A base-station B Many tx applications exhibit dynamic locality network functions, peer-to-peer payments … Example: cellular control plane manages phone connectivity and handovers among base stations Locality every phone user repeats txs: same phone & nearest base-station But locality is dynamic changes at run-time e.g., user commutes à base-station changes Observation 10 handover
  • 11. base-station A base-station B Many tx applications exhibit dynamic locality network functions, peer-to-peer payments … Example: cellular control plane manages phone connectivity and handovers among base stations Locality every phone user repeats txs: same phone & nearest base-station But locality is dynamic changes at run-time e.g., user commutes à base-station changes Observation 11 handover Can state-of-the-art datastores exploit dynamic locality?
  • 12. State-of-the-art reliable datastores 12 Static sharding (e.g., consistent hashing) Objects placed randomly on fixed nodes easy to locate and access objects reliable txs regardless of access pattern expensive reliable txs mostly remote accesses some blocking (control flow, pοinter chasing) related objects on different shards costly distributed commit tx coordinator
  • 13. State-of-the-art reliable datastores 13 Static sharding (e.g., consistent hashing) Objects placed randomly on fixed nodes easy to locate and access objects reliable txs regardless of access pattern expensive reliable txs mostly remote accesses some blocking (control flow, pοinter chasing) related objects on different shards costly distributed commit tx coordinator distributed commit 1. tx: if (p) b++; remote accesses Adapted from FaSST [OSDI’16] Adapted from FaSST [OSDI’16]
  • 14. distributed commit 1. tx: if (p) b++; remote accesses Adapted from FaSST [OSDI’16] State-of-the-art reliable datastores 14 Static sharding (e.g., consistent hashing) Objects placed randomly on fixed nodes easy to locate and access objects reliable txs regardless of access pattern expensive reliable txs mostly remote accesses some blocking (control flow, pοinter chasing) related objects on different shards costly distributed commit tx coordinator distributed commit 1. tx: if (p) b++; remote accesses Adapted from FaSST [OSDI’16] Adapted from FaSST [OSDI’16]
  • 15. distributed commit 1. tx: if (p) b++; remote accesses Adapted from FaSST [OSDI’16] State-of-the-art reliable datastores 15 Static sharding (e.g., consistent hashing) Objects placed randomly on fixed nodes easy to locate and access objects reliable txs regardless of access pattern expensive reliable txs mostly remote accesses some blocking (control flow, pοinter chasing) related objects on different shards costly distributed commit tx coordinator distributed commit 1. tx: if (p) b++; remote accesses Adapted from FaSST [OSDI’16] Adapted from FaSST [OSDI’16] Cannot exploit locality à expensive reliable txs
  • 16. Enter Zeus 16 Distributed datastore: exploits locality for fast reliable txs Inspired by multiprocessor’s hardware transactional memory Basic idea Each object has a single node owner = data + exclusive write access the owner changes dynamically and is tracked by replicated directory Coordinator executes a tx by acquiring ownership of all its objects à single-node commit Ownership stays with coordinator à future txs on these objects enjoy local accesses
  • 17. Enter Zeus 17 Distributed datastore: exploits locality for fast reliable txs Inspired by multiprocessor’s hardware transactional memory Basic idea Each object has a single node owner = data + exclusive write access the owner changes dynamically and is tracked by replicated directory Coordinator executes a tx by acquiring ownership of all its objects à single-node commit Ownership stays with coordinator à future txs on these objects enjoy local accesses
  • 18. Enter Zeus 18 Distributed datastore: exploits locality for fast reliable txs Inspired by multiprocessor’s hardware transactional memory Basic idea Each object has a single node owner = data + exclusive write access the owner changes dynamically and is tracked by replicated directory Coordinator executes a tx by acquiring ownership of all its objects à single-node commit Ownership stays with coordinator à future txs on these objects enjoy local accesses
  • 19. Enter Zeus 19 Distributed datastore: exploits locality for fast reliable txs Inspired by multiprocessor’s hardware transactional memory Basic idea Each object has a single node owner = data + exclusive write access the owner changes dynamically and is tracked by replicated directory Coordinator executes a tx by acquiring ownership of all its objects à single-node commit Ownership stays with coordinator à future txs on these objects enjoy local accesses
  • 20. Enter Zeus 20 Distributed datastore: exploits locality for fast reliable txs Inspired by multiprocessor’s hardware transactional memory Basic idea Each object has a single node owner = data + exclusive write access the owner changes dynamically and is tracked by replicated directory Coordinator executes a tx by acquiring ownership of all its objects à single-node commit Ownership stays with coordinator à future txs on these objects enjoy local accesses What are the exact steps?
  • 21. 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit (updates not yet replicated) 3. Reliable commit completes tx: updating replicas for availability 1. Locality-aware txs in Zeus 21
  • 22. 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit (updates not yet replicated) 3. Reliable commit completes tx: updating replicas for availability 1. Locality-aware txs in Zeus 22
  • 23. 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit (updates not yet replicated) 3. Reliable commit completes tx: updating replicas for availability 1. Locality-aware txs in Zeus 23 How to get ownership reliably?
  • 24. Ownership protocol 24 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20]
  • 25. Ownership protocol 25 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20] arbiters
  • 26. Ownership protocol 26 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20] arbiters
  • 27. Ownership protocol 27 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20] arbiters
  • 28. Ownership protocol 28 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20] arbiters Ownership is acquired & coordinator proceeds with tx
  • 29. Ownership protocol 29 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20] arbiters
  • 30. Ownership protocol 30 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20] arbiters
  • 31. Ownership protocol 31 1) Coordinator gets ownership from current owner 2) Keeps consistent directory replicas 1. Coordinator sends object ownership invalidations (through a directory replica) to all arbiters 2. Arbiters acknowledge the coordinator directly 3. Coordinator sends validations informing arbiters for acquisition Conflicts: logical timestamps, fault tolerance: idempotent replays as in Hermes [ASPLOS’20] Correctness verified under conflicts and faults arbiters
  • 32. 32 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit 3. Reliable commit completes tx: updating replicas for availability Locality + ownership stays with coordinator Locality-aware txs in Zeus
  • 33. 33 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit 3. Reliable commit completes tx: updating replicas for availability common case Locality + ownership stays with coordinator Locality-aware txs in Zeus
  • 34. 34 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit 3. Reliable commit completes tx: updating replicas for availability common case Locality-aware txs in Zeus
  • 35. 35 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit 3. Reliable commit completes tx: updating replicas for availability common case Locality-aware txs in Zeus
  • 36. 36 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit 3. Reliable commit completes tx: updating replicas for availability common case Locality-aware txs in Zeus
  • 37. 37 1. Execute as the owner a) at object access: if (not owner) get ownership b) local access 2. Local commit commits tx: traditional single-node commit 3. Reliable commit completes tx: updating replicas for availability common case Locality-aware txs in Zeus Great! But how efficient is reliable commit?
  • 38. Reliable commit 38 1. Committed tx à no conflicts à fast tx completion - coordinator sends updates to replicas and waits for ACKs - read-only txs: no updates à no reliable commit 2. No conflicts à no aborts à pipelined txs (no waiting for replication) - subsequent txs use local state with certainty & issue updates - coordinator sequences updates, which replicas apply in order Fault tolerance: idempotent replays
  • 39. Reliable commit 39 1. Committed tx à no conflicts à fast tx completion - coordinator sends updates to replicas and waits for ACKs - read-only txs: no updates à no reliable commit 2. No conflicts à no aborts à pipelined txs (no waiting for replication) - subsequent txs use local state with certainty & issue updates - coordinator sequences updates, which replicas apply in order Fault tolerance: idempotent replays
  • 40. Reliable commit 40 1. Committed tx à no conflicts à fast tx completion - coordinator sends updates to replicas and waits for ACKs - read-only txs: no updates à no reliable commit 2. No conflicts à no aborts à pipelined txs (no waiting for replication) - subsequent txs use local state with certainty & issue updates - coordinator sequences updates, which replicas apply in order Fault tolerance: idempotent replays
  • 41. Reliable commit 41 1. Committed tx à no conflicts à fast tx completion - coordinator sends updates to replicas and waits for ACKs - read-only txs: no updates à no reliable commit 2. No conflicts à no aborts à pipelined txs (no waiting for replication) - subsequent txs use local state with certainty & issue updates - coordinator sequences updates, which replicas apply in order Fault tolerance: idempotent replays Very efficient! Correctness verified under faults
  • 42. Recap: txs in Zeus 42 Locality-aware, distributed and reliable
  • 43. Recap: txs in Zeus 43 Locality-aware, distributed and reliable
  • 44. Recap: txs in Zeus 44 Locality-aware, distributed and reliable
  • 45. Recap: txs in Zeus 45 Locality-aware, distributed and reliable
  • 46. Recap: txs in Zeus 46 Locality-aware, distributed and reliable
  • 47. Recap: txs in Zeus 47 Locality-aware, distributed and reliable Awesome! Does it translate into performance?
  • 48. Performance 48 6 nodes, 3-way replication, Zeus 40Gb (no RDMA)
  • 50. Performance 50 Zeus: within 9% of ideal Million txs / sec Ideal (all local) Zeus (real-world locality) Handovers 6 nodes, 3-way replication, Zeus 40Gb (no RDMA) 9%
  • 51. Performance 51 Zeus: within 9% of ideal Million txs / sec Ideal (all local) Zeus (real-world locality) Handovers 6 nodes, 3-way replication, Zeus 40Gb (no RDMA) % write txs needing ownership TATP 9%
  • 52. Performance 52 Zeus: within 9% of ideal Million txs / sec Ideal (all local) Zeus (real-world locality) Handovers 6 nodes, 3-way replication, Zeus 40Gb (no RDMA) % write txs needing ownership TATP 2x 9% Up to 40M.tx/s and 2x state-of-the-art FaSST [OSDI’16], FaRM [SOSP’15] which use 56Gb RDMA
  • 53. Performance 53 Zeus: within 9% of ideal Million txs / sec Ideal (all local) Zeus (real-world locality) Handovers 6 nodes, 3-way replication, Zeus 40Gb (no RDMA) % write txs needing ownership TATP 2x 9% Up to 40M.tx/s and 2x state-of-the-art FaSST [OSDI’16], FaRM [SOSP’15] which use 56Gb RDMA Paper: more benchmarks, ownership, latency ...
  • 54. State-of-the-art reliable txs over static sharding: cannot exploit dynamic locality remote accesses costly distributed commit Zeus’ reliable txs exploit locality via dynamic ownership: local accesses in the common case single-node commit - local for read-only txs - fast and pipelined for write txs Performance 10s millions txs/second up to 2x state-of-the-art Bonus: programmability! Conclusion 54
  • 55. State-of-the-art reliable txs over static sharding: cannot exploit dynamic locality remote accesses costly distributed commit Zeus’ reliable txs exploit locality via dynamic ownership: local accesses in the common case single-node commit - local for read-only txs - fast and pipelined for write txs Performance 10s millions txs/second up to 2x state-of-the-art Bonus: programmability! Conclusion 55 zeus-protocol.com TLA+ specification, Q&A …
  • 56. State-of-the-art reliable txs over static sharding: cannot exploit dynamic locality remote accesses costly distributed commit Zeus’ reliable txs exploit locality via dynamic ownership: local accesses in the common case single-node commit - local for read-only txs - fast and pipelined for write txs Performance 10s millions txs/second up to 2x state-of-the-art Bonus: programmability! Conclusion 56 zeus-protocol.com TLA+ specification, Q&A …
  • 57. State-of-the-art reliable txs over static sharding: cannot exploit dynamic locality remote accesses costly distributed commit Zeus’ reliable txs exploit locality via dynamic ownership: local accesses in the common case single-node commit - local for read-only txs - fast and pipelined for write txs Performance 10s millions txs/second up to 2x state-of-the-art Bonus: programmability! Conclusion 57 zeus-protocol.com TLA+ specification, Q&A …
  • 58. State-of-the-art reliable txs over static sharding: cannot exploit dynamic locality remote accesses costly distributed commit Zeus’ reliable txs exploit locality via dynamic ownership: local accesses in the common case single-node commit - local for read-only txs - fast and pipelined for write txs Performance 10s millions txs/second up to 2x state-of-the-art Bonus: programmability! Conclusion 58 zeus-protocol.com TLA+ specification, Q&A … Reliable txs with locality? Use Zeus!