Kite: efficient and available release consistency for the datacenter

Kite: Efficient and Available Release Consistency
for the Datacenter
Vasilis Gavrielatos, Antonios Katsarakis, Vijay Nagarajan, Boris Grot, Arpit Joshi*
University of Edinburgh, *Intel
Thanks to:
Key-Value Stores
Replicated KVS
Characteristics
● Read-Write-RMW API
● Highly Available
2
Key-Value Stores
Replicated KVS
Availability ≅ Nonblocking
Characteristics
● Read-Write-RMW API
● Highly Available
3
Key-Value Stores
Replicated KVS
Characteristics
● Read-Write-RMW API
● Highly Available
○ Replicated for fault tolerance
4
Key-Value Stores
Replicated KVS
Replication ⇒ Performance vs Consistency
Characteristics
● Read-Write-RMW API
● Highly Available
○ Replicated for fault tolerance
5
Performance Programmability
Weak
Consistency
Strong
Consistency
Consistency vs Performance
6
Performance Programmability
Weak
Consistency
Strong
Consistency
???
Consistency vs Performance
7
Existing Solution: Multiple Consistency Levels (MCL)
MCL Replicated KVS
Weak
Write
Strong
ReadAmazon DB
App Engine
PNUTS
Manhattan
Pileus
8
Amazon DB
App Engine
PNUTS
Manhattan
Pileus
Existing Solution: Multiple Consistency Levels (MCL)
MCL Replicated KVS
What about programming patterns?
Weak
Write
Strong
Read
9
The problem
MCL Replicated KVS
Alice
10
The problem
MCL Replicated KVS
Alice
void CreatePlayer( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
/// Player created
Write(player_created = true);
}
11
The problem
MCL Replicated KVS
Alice
If you can read the flag, then you
must be able to see the player!
void CreatePlayer( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
/// Player created
Write(player_created = true);
}
“the flag”
12
The problem
MCL Replicated KVS
void CreatePlayer( ) {
Write(name = “Leo”);
Write(surname = “Messi”);
Write(age = 32);
/// Player created
Write(player_created = true);
}
Alice
Fine by me!
13
The problem
MCL Replicated KVS
void CreatePlayer( ) {
Write(name = “Leo”);
Write(surname = “Messi”);
/// Player created
Write(player_created = true);
Write(age = 32);
}
Alice
No way!
14
The problem
MCL Replicated KVS
void CreatePlayer( ) {
Write(name = “Leo”);
Write(surname = “Messi”);
/// Player created
Write(player_created = true);
Write(age = 32);
}
Alice
There is no way to capture this with MCLs!
No way!
15
MCL Replicated KVS
Alice
MCL solution
void CreatePlayer( ) {
Strong_Write(age = 32);
Strong_Write(name = “Leo”);
Strong_Write(surname = “Messi”);
/// Player created
Strong_Write(player_created = true);
}
16
MCL Replicated KVS
Alice
Seems like an overkill...
MCL solution
void CreatePlayer( ) {
Strong_Write(age = 32);
Strong_Write(name = “Leo”);
Strong_Write(surname = “Messi”);
/// Player created
Strong_Write(player_created = true);
}
Missed performance opportunity
17
Shared Memory World
Sweet-spot in the Performance-vs-Consistency?
18
Sweet-spot in the Performance-vs-Consistency?
Shared Memory World
DRF-SC!
19
Programming paradigm: DRF-SC
Alice
Multiprocessor
20
DRF-SC
Programming
Paradigm
Alice
Programming paradigm: DRF-SC
Multiprocessor
21
DRF-SC
Programming
Paradigm
Alice
Programming paradigm: DRF-SC
Multiprocessor
Annotated synchronization ⇒ SC
22
Under the hood: Release Consistency
Alice
Releaseaa
Consistency
DRF-compliant
memory model
DRF-SC
Programming
Paradigm
Multiprocessor
23
Under the hood: Release Consistency
Alice
void CreatePlayer( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
/// Player created
Release(player_created = true);
}
Multiprocessor
24
Under the hood: Release Consistency
Alice
Multiprocessor
void CreatePlayer( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
/// Player created
Release(player_created = true);
}
Invariant: Writes appear to complete before the Release
25
Under the hood: Release Consistency
Alice
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
Bob
Multiprocessor
26
Under the hood: Release Consistency
Alice
Multiprocessor
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
Invariant: Reads appear to complete after the Acquire
Bob
27
RC Semantics
RC API Ordering
Read & Write none
Acquire acquire ⇒ all
Release all ⇒ release
28
Alice
Release
Consistency
DRF-compliant
memory model
DRF-SC
Programming
Paradigm
Can we do the same for KVSes?
Replicated KVS
29
Kite
Kite Replicated KVS
A Replicated KVS with
➢ Release Consistency
➢ High Availability
30
Our approach to building Kite
Steps
1 API mappings
2
31
Our approach to building Kite
Steps
1 API mappings
2 RC Semantics
32
Kite: API - Mappings
API Protocol
Reads
Writes
Acquire
Releases
Read-Modify-Writes
(RMWs)
33
Kite: API - Mappings
API Protocol
Reads
Writes
34
Kite: API - Mappings
API Protocol Overhead Consistency Availability
Reads Zero
Eventual
Consistency
High
Writes 1 Broadcast
35
Kite: API - Mappings
API Protocol Overhead Consistency Availability
Reads
Eventual Store
Zero
Eventual
Consistency
High
Writes 1 Broadcast
*
* Sebastian Burckhardt. 2014. Principles of Eventual Consistency 36
Kite: API - Mappings
API Protocol Overhead Consistency Availability
Reads
Eventual Store
Zero
Eventual
Consistency
High
Writes 1 Broadcast
Acquire Local
Linearizability High
Releases 1 Broadcast
37
Kite: API - Mappings
API Protocol Overhead Consistency Availability
Reads
Eventual Store
Zero
Eventual
Consistency
High
Writes 1 Broadcast
Acquire
ABD*
1 Broadcast*
Linearizability High
Releases 2 Broadcasts
*N. A. Lynch and A. A. Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts 38
Kite: API - Mappings
API Protocol Overhead Consistency Availability
Reads
Eventual Store
Zero
Eventual
Consistency
High
Writes 1 Broadcast
Acquire
ABD
1 Broadcast*
Linearizability High
Releases 2 Broadcasts
Read-Modify-Writes
(RMWs)
1 Broadcast Consensus High
39
Kite: API - Mappings
API Protocol Overhead Consistency Availability
Reads
Eventual Store
Zero
Eventual
Consistency
High
Writes 1 Broadcast
Acquire
ABD
1 Broadcast*
Linearizability High
Releases 2 Broadcasts
Read-Modify-Writes
(RMWs)
Paxos* 3 Broadcasts Consensus High
*Leslie Lamport. 1998. The part-time parliament. 40
Kite: API - Mappings
API Overhead Protocol
Reads Zero
Eventual Store
Writes 1 Broadcast
Acquire 1 Broadcast*
ABD
Releases 2 Broadcasts
Read-Modify-Writes
(RMWs)
3 Broadcasts Paxos
Common Case
Synchronization
Heavy
Synchronization
41
Our approach to building Kite
Steps
1 API mappings
2 RC Semantics
42
Kite: Fast-path/Slow-path
RC API Ordering
Read & Write none
Acquire acquire ⇒ all
Release all ⇒ release
RC Semantics
Kite: Fast-path/Slow-path
Fast-Path
Common operation
RC API Ordering
Read & Write none
Acquire acquire ⇒ all
Release all ⇒ release
RC Semantics
Kite: Fast-path/Slow-path
Fast-Path
Slow-Path When
slow
Common operation
RC API Ordering
Read & Write none
Acquire acquire ⇒ all
Release all ⇒ release
RC Semantics
Fast-path
Alice Bob
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
46
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
Fast-path
Alice Bob
(age = 32, name = “Leo”, ….)
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
47
Fast-path
Alice Bob
←(ack) ←(ack) ←(ack) ←(ack)
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
48
Alice Bob
(Release (player_created = true))
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
49
Fast-path
Alice Bob
(Release (player_created = true))
Before a release, gather all acks for prior writes
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
50
Fast-path
Alice Bob
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
51
Fast-path
Alice Bob
(Acquire (player_created))
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
52
Fast-path
Alice Bob
(true)→ (true)→ (true )→ (true) →
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
53
Fast-path
Alice Bob
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
Local
Reads
54
Fast-path
Alice Bob
What if we cannot gather all acks before a release?
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
Local
Reads
55
Fast-path
Alice Bob
Fast-path ⇒ Slow-path
56
Alice Bob
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
57
Fast-path ⇒ Slow-path
Alice Bob
(age = 32, name = “Leo”, ….)
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
58
Fast-path ⇒ Slow-path
Alice Bob
←(ack) ←(ack) ←(ack)
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
59
Fast-path ⇒ Slow-path
Alice Bob
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
60
Fast-path ⇒ Slow-path
Alice Bob
(Node-5 is delinquent!)
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
61
Fast-path ⇒ Slow-path
Alice Bob
←(ack) ←(ack) ←(ack)
5 = delinquent 5 = delinquent 5 = delinquent
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
62
Fast-path ⇒ Slow-path
Alice Bob
(Release (player_created = true))
void CreatePlayer( )( ) {
Write(age = 32);
Write(name = “Leo”);
Write(surname = “Messi”);
Release(player_created = true);
}
5 = delinquent 5 = delinquent 5 = delinquent
63
Fast-path ⇒ Slow-path
Alice Bob
(Acquire (player_created))
5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
64
Fast-path ⇒ Slow-path
Alice Bob
(true, delinquent )→ (true, delinquent )→ (true, delinquent ) →
5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
65
Fast-path ⇒ Slow-path
Alice Bob
(Reset Delinquency)
Slow-path
I am delinquent!5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
66
Alice Bob
(Read age, name, ...)
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
I am delinquent!
67
Slow-path ⇒ Fast-path
Alice Bob
(32, “Leo”, ...)→ (32, “Leo”, ...)→ (32, “Leo”, ...)→
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
I am delinquent!
68
Slow-path ⇒ Fast-path
Alice Bob
I am not
delinquent
anymore!
void ReadPlayer( ) {
Acquire(player_created);
Read(age);
Read(name);
Read(surname);
}
69
Fast-path
Kite: Fast-path/Slow-path Recap
Fast path / Slow path mechanism
Before a Release
Gather all acks
➢ On timing-out, broadcast the delinquent machines
On an Acquire
Slow-path read / write
70
Fast path / Slow path mechanism
Before a Release
Gather all acks
➢ On timing-out, broadcast the delinquent machines
On an Acquire
Discover delinquency
➢ Slow-path if delinquent
Slow-path read / write
Kite: Fast-path/Slow-path Recap
71
Fast path / Slow path mechanism
Before a Release
Gather all acks
➢ On timing-out, broadcast the delinquent machines
On an Acquire
Discover delinquency
➢ Slow-path if delinquent
Slow-path read / write
➢ Add broadcast round
➢ Restore key to fast-path
Kite: Fast-path/Slow-path Recap
72
Kite’s Implementation
● RDMA-enabled
● Multi-threaded
● Asynchronous API
Infrastructure:
● Servers: 5 x (Intel Xeon E5-2630v4) with 64GB memory
● Network: 5 x 56 Gbit/s Infiniband NICs — 1 x 12-port Infiniband switch
Baseline:
● In-house ZAB implementation
○ RDMA-enabled, multi-threaded
Workloads:
1. Microbenchmarks
2. Lock-free data structures
Experimental Setup
73
Microbenchmarks
74
Microbenchmarks
75
Microbenchmarks
76
Kite
Microbenchmarks
77
5% Sync
Microbenchmarks
78
20% Sync & 5% RMW5% Sync
Microbenchmarks
79
20% Sync & 5% RMW5% Sync
Microbenchmarks
80
Lock-free Data Structures
81
OperationspersecondnormalizedtoZAB
Kite: a replicated Key-Value Store with
● High availability &
● Release Consistency
Components:
● API mappings: Eventual Store, ABD & Paxos
● RC barrier semantics: Fast / Slow path
○ paper contains proof
Implementation features:
● Heavily multi-threaded
● RDMA-enabled
● Asynchronous API
● https://github.com/icsa-caps/Kite
Conclusion
Kite Replicated KVS
RDMA
82
Kite: a replicated Key-Value Store with
● High availability &
● Release Consistency
Components:
● API mappings: Eventual Store, ABD & Paxos
● RC barrier semantics: Fast / Slow path
○ paper contains proof
Implementation features:
● Heavily multi-threaded
● RDMA-enabled
● Asynchronous API
● https://github.com/icsa-caps/Kite
Conclusion
Kite Replicated KVS
RDMA
Thank you! Questions?
83
Back-up slides
86
87
Running Code on Kite
88
Write-only Throughput
89
Write-only Throughput with All-Aboard
90
Kite: efficient and available release consistency for the datacenter
1 of 90

Recommended

Porting and Optimization of Numerical Libraries for ARM SVE by
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVELinaro
2.7K views15 slides
HCQC : HPC Compiler Quality Checker by
HCQC : HPC Compiler Quality CheckerHCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerLinaro
2.8K views15 slides
Unleash the Power of Redis with Amazon ElastiCache by
Unleash the Power of Redis with Amazon ElastiCacheUnleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheAmazon Web Services
514 views70 slides
Virtual Machine for Regular Expressions by
Virtual Machine for Regular ExpressionsVirtual Machine for Regular Expressions
Virtual Machine for Regular ExpressionsAlexander Yakushev
432 views102 slides
Performance evaluation with Arm HPC tools for SVE by
Performance evaluation with Arm HPC tools for SVEPerformance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVELinaro
2.8K views8 slides
Applying CAP theorem to build distributed robust Phoenix API app by
Applying CAP theorem to build distributed robust Phoenix API appApplying CAP theorem to build distributed robust Phoenix API app
Applying CAP theorem to build distributed robust Phoenix API appYeong Sheng Tan
83 views24 slides

More Related Content

Similar to Kite: efficient and available release consistency for the datacenter

Redis overview for Software Architecture Forum by
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture ForumChristopher Spring
5.1K views50 slides
Opal compiler by
Opal compilerOpal compiler
Opal compilerJorge Ressia
1.8K views88 slides
A world to win: WebAssembly for the rest of us by
A world to win: WebAssembly for the rest of usA world to win: WebAssembly for the rest of us
A world to win: WebAssembly for the rest of usIgalia
15 views35 slides
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All by
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
690 views24 slides
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016 by
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Chris Fregly
887 views117 slides
Hermes Reliable Replication Protocol - ASPLOS'20 Presentation by
Hermes Reliable Replication Protocol -  ASPLOS'20 PresentationHermes Reliable Replication Protocol -  ASPLOS'20 Presentation
Hermes Reliable Replication Protocol - ASPLOS'20 PresentationAntonios Katsarakis
1.2K views75 slides

Similar to Kite: efficient and available release consistency for the datacenter(20)

Redis overview for Software Architecture Forum by Christopher Spring
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
Christopher Spring5.1K views
A world to win: WebAssembly for the rest of us by Igalia
A world to win: WebAssembly for the rest of usA world to win: WebAssembly for the rest of us
A world to win: WebAssembly for the rest of us
Igalia15 views
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All by ScyllaDB
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
ScyllaDB690 views
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016 by Chris Fregly
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Chris Fregly887 views
Hermes Reliable Replication Protocol - ASPLOS'20 Presentation by Antonios Katsarakis
Hermes Reliable Replication Protocol -  ASPLOS'20 PresentationHermes Reliable Replication Protocol -  ASPLOS'20 Presentation
Hermes Reliable Replication Protocol - ASPLOS'20 Presentation
Antonios Katsarakis1.2K views
Sending a for ahuh. win32 exploit development old school by Nahidul Kibria
Sending a for ahuh. win32 exploit development old schoolSending a for ahuh. win32 exploit development old school
Sending a for ahuh. win32 exploit development old school
Nahidul Kibria1.6K views
FortranCon2020: Highly Parallel Fortran and OpenACC Directives by Jeff Larkin
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
Jeff Larkin397 views
Redis — memcached on steroids by Robert Lehmann
Redis — memcached on steroidsRedis — memcached on steroids
Redis — memcached on steroids
Robert Lehmann1.8K views
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C... by Odinot Stanislas
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas23.5K views
When OLAP Meets Real-Time, What Happens in eBay? by DataWorks Summit
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit954 views
Kick-R: Get your own R instance with 36 cores on AWS by Kiwamu Okabe
Kick-R: Get your own R instance with 36 cores on AWSKick-R: Get your own R instance with 36 cores on AWS
Kick-R: Get your own R instance with 36 cores on AWS
Kiwamu Okabe1.3K views
Experiences building a distributed shared log on RADOS - Noah Watkins by Ceph Community
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
Ceph Community 98 views
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP by Thomas Graf
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf10.1K views
Cilium - Network and Application Security with BPF and XDP Thomas Graf, Cova... by Docker, Inc.
Cilium - Network and Application Security with BPF and XDP  Thomas Graf, Cova...Cilium - Network and Application Security with BPF and XDP  Thomas Graf, Cova...
Cilium - Network and Application Security with BPF and XDP Thomas Graf, Cova...
Docker, Inc.10.8K views

Recently uploaded

.NET Deserialization Attacks by
.NET Deserialization Attacks.NET Deserialization Attacks
.NET Deserialization AttacksDharmalingam Ganesan
5 views50 slides
What is API by
What is APIWhat is API
What is APIartembondar5
13 views15 slides
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...TomHalpin9
6 views29 slides
predicting-m3-devopsconMunich-2023-v2.pptx by
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptxTier1 app
12 views33 slides
tecnologia18.docx by
tecnologia18.docxtecnologia18.docx
tecnologia18.docxnosi6702
5 views5 slides
Understanding HTML terminology by
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminologyartembondar5
7 views8 slides

Recently uploaded(20)

Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
predicting-m3-devopsconMunich-2023-v2.pptx by Tier1 app
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app12 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67025 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar57 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers42 views
AI and Ml presentation .pptx by FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8714 views
Navigating container technology for enhanced security by Niklas Saari by Metosin Oy
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy15 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 5 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic15 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom14 views
aATP - New Correlation Confirmation Feature.pptx by EsatEsenek1
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptx
EsatEsenek1205 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert33 views
360 graden fabriek by info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492165 views
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation by HCLSoftware
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationDRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
HCLSoftware6 views

Kite: efficient and available release consistency for the datacenter

  • 1. Kite: Efficient and Available Release Consistency for the Datacenter Vasilis Gavrielatos, Antonios Katsarakis, Vijay Nagarajan, Boris Grot, Arpit Joshi* University of Edinburgh, *Intel Thanks to:
  • 2. Key-Value Stores Replicated KVS Characteristics ● Read-Write-RMW API ● Highly Available 2
  • 3. Key-Value Stores Replicated KVS Availability ≅ Nonblocking Characteristics ● Read-Write-RMW API ● Highly Available 3
  • 4. Key-Value Stores Replicated KVS Characteristics ● Read-Write-RMW API ● Highly Available ○ Replicated for fault tolerance 4
  • 5. Key-Value Stores Replicated KVS Replication ⇒ Performance vs Consistency Characteristics ● Read-Write-RMW API ● Highly Available ○ Replicated for fault tolerance 5
  • 8. Existing Solution: Multiple Consistency Levels (MCL) MCL Replicated KVS Weak Write Strong ReadAmazon DB App Engine PNUTS Manhattan Pileus 8
  • 9. Amazon DB App Engine PNUTS Manhattan Pileus Existing Solution: Multiple Consistency Levels (MCL) MCL Replicated KVS What about programming patterns? Weak Write Strong Read 9
  • 11. The problem MCL Replicated KVS Alice void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); } 11
  • 12. The problem MCL Replicated KVS Alice If you can read the flag, then you must be able to see the player! void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); } “the flag” 12
  • 13. The problem MCL Replicated KVS void CreatePlayer( ) { Write(name = “Leo”); Write(surname = “Messi”); Write(age = 32); /// Player created Write(player_created = true); } Alice Fine by me! 13
  • 14. The problem MCL Replicated KVS void CreatePlayer( ) { Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); Write(age = 32); } Alice No way! 14
  • 15. The problem MCL Replicated KVS void CreatePlayer( ) { Write(name = “Leo”); Write(surname = “Messi”); /// Player created Write(player_created = true); Write(age = 32); } Alice There is no way to capture this with MCLs! No way! 15
  • 16. MCL Replicated KVS Alice MCL solution void CreatePlayer( ) { Strong_Write(age = 32); Strong_Write(name = “Leo”); Strong_Write(surname = “Messi”); /// Player created Strong_Write(player_created = true); } 16
  • 17. MCL Replicated KVS Alice Seems like an overkill... MCL solution void CreatePlayer( ) { Strong_Write(age = 32); Strong_Write(name = “Leo”); Strong_Write(surname = “Messi”); /// Player created Strong_Write(player_created = true); } Missed performance opportunity 17
  • 18. Shared Memory World Sweet-spot in the Performance-vs-Consistency? 18
  • 19. Sweet-spot in the Performance-vs-Consistency? Shared Memory World DRF-SC! 19
  • 23. Under the hood: Release Consistency Alice Releaseaa Consistency DRF-compliant memory model DRF-SC Programming Paradigm Multiprocessor 23
  • 24. Under the hood: Release Consistency Alice void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Release(player_created = true); } Multiprocessor 24
  • 25. Under the hood: Release Consistency Alice Multiprocessor void CreatePlayer( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); /// Player created Release(player_created = true); } Invariant: Writes appear to complete before the Release 25
  • 26. Under the hood: Release Consistency Alice void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Bob Multiprocessor 26
  • 27. Under the hood: Release Consistency Alice Multiprocessor void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Invariant: Reads appear to complete after the Acquire Bob 27
  • 28. RC Semantics RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release 28
  • 30. Kite Kite Replicated KVS A Replicated KVS with ➢ Release Consistency ➢ High Availability 30
  • 31. Our approach to building Kite Steps 1 API mappings 2 31
  • 32. Our approach to building Kite Steps 1 API mappings 2 RC Semantics 32
  • 33. Kite: API - Mappings API Protocol Reads Writes Acquire Releases Read-Modify-Writes (RMWs) 33
  • 34. Kite: API - Mappings API Protocol Reads Writes 34
  • 35. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Zero Eventual Consistency High Writes 1 Broadcast 35
  • 36. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast * * Sebastian Burckhardt. 2014. Principles of Eventual Consistency 36
  • 37. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire Local Linearizability High Releases 1 Broadcast 37
  • 38. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire ABD* 1 Broadcast* Linearizability High Releases 2 Broadcasts *N. A. Lynch and A. A. Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts 38
  • 39. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire ABD 1 Broadcast* Linearizability High Releases 2 Broadcasts Read-Modify-Writes (RMWs) 1 Broadcast Consensus High 39
  • 40. Kite: API - Mappings API Protocol Overhead Consistency Availability Reads Eventual Store Zero Eventual Consistency High Writes 1 Broadcast Acquire ABD 1 Broadcast* Linearizability High Releases 2 Broadcasts Read-Modify-Writes (RMWs) Paxos* 3 Broadcasts Consensus High *Leslie Lamport. 1998. The part-time parliament. 40
  • 41. Kite: API - Mappings API Overhead Protocol Reads Zero Eventual Store Writes 1 Broadcast Acquire 1 Broadcast* ABD Releases 2 Broadcasts Read-Modify-Writes (RMWs) 3 Broadcasts Paxos Common Case Synchronization Heavy Synchronization 41
  • 42. Our approach to building Kite Steps 1 API mappings 2 RC Semantics 42
  • 43. Kite: Fast-path/Slow-path RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release RC Semantics
  • 44. Kite: Fast-path/Slow-path Fast-Path Common operation RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release RC Semantics
  • 45. Kite: Fast-path/Slow-path Fast-Path Slow-Path When slow Common operation RC API Ordering Read & Write none Acquire acquire ⇒ all Release all ⇒ release RC Semantics
  • 46. Fast-path Alice Bob void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 46 void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); }
  • 47. Fast-path Alice Bob (age = 32, name = “Leo”, ….) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 47
  • 48. Fast-path Alice Bob ←(ack) ←(ack) ←(ack) ←(ack) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 48
  • 49. Alice Bob (Release (player_created = true)) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 49 Fast-path
  • 50. Alice Bob (Release (player_created = true)) Before a release, gather all acks for prior writes void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 50 Fast-path
  • 51. Alice Bob void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 51 Fast-path
  • 52. Alice Bob (Acquire (player_created)) void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 52 Fast-path
  • 53. Alice Bob (true)→ (true)→ (true )→ (true) → void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 53 Fast-path
  • 54. Alice Bob void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Local Reads 54 Fast-path
  • 55. Alice Bob What if we cannot gather all acks before a release? void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } Local Reads 55 Fast-path
  • 56. Alice Bob Fast-path ⇒ Slow-path 56
  • 57. Alice Bob void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 57 Fast-path ⇒ Slow-path
  • 58. Alice Bob (age = 32, name = “Leo”, ….) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 58 Fast-path ⇒ Slow-path
  • 59. Alice Bob ←(ack) ←(ack) ←(ack) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 59 Fast-path ⇒ Slow-path
  • 60. Alice Bob void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 60 Fast-path ⇒ Slow-path
  • 61. Alice Bob (Node-5 is delinquent!) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 61 Fast-path ⇒ Slow-path
  • 62. Alice Bob ←(ack) ←(ack) ←(ack) 5 = delinquent 5 = delinquent 5 = delinquent void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 62 Fast-path ⇒ Slow-path
  • 63. Alice Bob (Release (player_created = true)) void CreatePlayer( )( ) { Write(age = 32); Write(name = “Leo”); Write(surname = “Messi”); Release(player_created = true); } 5 = delinquent 5 = delinquent 5 = delinquent 63 Fast-path ⇒ Slow-path
  • 64. Alice Bob (Acquire (player_created)) 5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 64 Fast-path ⇒ Slow-path
  • 65. Alice Bob (true, delinquent )→ (true, delinquent )→ (true, delinquent ) → 5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 65 Fast-path ⇒ Slow-path
  • 66. Alice Bob (Reset Delinquency) Slow-path I am delinquent!5 = delinquent 5 = delinquent 5 = delinquent5 = delinquent 5 = delinquent 5 = delinquent void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 66
  • 67. Alice Bob (Read age, name, ...) void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } I am delinquent! 67 Slow-path ⇒ Fast-path
  • 68. Alice Bob (32, “Leo”, ...)→ (32, “Leo”, ...)→ (32, “Leo”, ...)→ void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } I am delinquent! 68 Slow-path ⇒ Fast-path
  • 69. Alice Bob I am not delinquent anymore! void ReadPlayer( ) { Acquire(player_created); Read(age); Read(name); Read(surname); } 69 Fast-path
  • 70. Kite: Fast-path/Slow-path Recap Fast path / Slow path mechanism Before a Release Gather all acks ➢ On timing-out, broadcast the delinquent machines On an Acquire Slow-path read / write 70
  • 71. Fast path / Slow path mechanism Before a Release Gather all acks ➢ On timing-out, broadcast the delinquent machines On an Acquire Discover delinquency ➢ Slow-path if delinquent Slow-path read / write Kite: Fast-path/Slow-path Recap 71
  • 72. Fast path / Slow path mechanism Before a Release Gather all acks ➢ On timing-out, broadcast the delinquent machines On an Acquire Discover delinquency ➢ Slow-path if delinquent Slow-path read / write ➢ Add broadcast round ➢ Restore key to fast-path Kite: Fast-path/Slow-path Recap 72
  • 73. Kite’s Implementation ● RDMA-enabled ● Multi-threaded ● Asynchronous API Infrastructure: ● Servers: 5 x (Intel Xeon E5-2630v4) with 64GB memory ● Network: 5 x 56 Gbit/s Infiniband NICs — 1 x 12-port Infiniband switch Baseline: ● In-house ZAB implementation ○ RDMA-enabled, multi-threaded Workloads: 1. Microbenchmarks 2. Lock-free data structures Experimental Setup 73
  • 79. 20% Sync & 5% RMW5% Sync Microbenchmarks 79
  • 80. 20% Sync & 5% RMW5% Sync Microbenchmarks 80
  • 82. Kite: a replicated Key-Value Store with ● High availability & ● Release Consistency Components: ● API mappings: Eventual Store, ABD & Paxos ● RC barrier semantics: Fast / Slow path ○ paper contains proof Implementation features: ● Heavily multi-threaded ● RDMA-enabled ● Asynchronous API ● https://github.com/icsa-caps/Kite Conclusion Kite Replicated KVS RDMA 82
  • 83. Kite: a replicated Key-Value Store with ● High availability & ● Release Consistency Components: ● API mappings: Eventual Store, ABD & Paxos ● RC barrier semantics: Fast / Slow path ○ paper contains proof Implementation features: ● Heavily multi-threaded ● RDMA-enabled ● Asynchronous API ● https://github.com/icsa-caps/Kite Conclusion Kite Replicated KVS RDMA Thank you! Questions? 83
  • 85. 86
  • 86. 87
  • 87. Running Code on Kite 88
  • 89. Write-only Throughput with All-Aboard 90