SlideShare a Scribd company logo
Distributed Consensus:
Making Impossible
Possible
Heidi Howard
PhD Student @ University of Cambridge
heidi.howard@cl.cam.ac.uk
@heidiann360
What is Consensus?
“The process by which we reach agreement over
system state between unreliable machines connected
by asynchronous networks”
Why?
• Distributed locking
• Banking
• Safety critical systems
• Distributed scheduling and coordination
Anything which requires guaranteed agreement
A walk through history
We are going to take a journey through the
developments in distributed consensus, spanning 3
decades.
We are going to search for answers to questions like:
• how do we reach consensus?
• what is the best method for reaching consensus?
• can we even reach consensus?
• what’s next in the field?
FLP Result
off to a slippery start
Impossibility of distributed
consensus with one faulty process
Michael Fischer, Nancy Lynch
and Michael Paterson
ACM SIGACT-SIGMOD
Symposium on Principles of
Database Systems
1983
FLP
We cannot guarantee agreement in an asynchronous
system where even one host might fail.
Why?
We cannot reliably detect failures. We cannot know
for sure the difference between a slow host/network
and a failed host
NB: We can still guarantee safety, the issue limited to
guaranteeing liveness.
Solution to FLP
In practice:
We accept that sometimes the system will not be
available. We mitigate this using timers and backoffs.
In theory:
We make weaker assumptions about the synchrony
of the system e.g. messages arrive within a year.
Paxos
Lamport’s original consensus algorithm
The Part-Time Parliament
Leslie Lamport
ACM Transactions on Computer Systems
May 1998
Paxos
The original consensus algorithm for reaching
agreement on a single value.
• two phase process: promise and commit
• majority agreement
• monotonically increasing numbers
Paxos Example -
Failure Free
1 2
3
P:
C:
P:
C:
P:
C:
1 2
3
P:
C:
P:
C:
P:
C:
B
Incoming request from Bob
1 2
3
P:
C:
P: 13
C:
P:
C:
B
Promise (13) ?
Phase 1
Promise (13) ?
1 2
3
P: 13
C:
OKOK
P: 13
C:
P: 13
C:
Phase 1
1 2
3
P: 13
C: 13, B
P: 13
C:
P: 13
C:
Phase 2
Commit (13, ) ?B Commit (13, ) ?B
1 2
3
P: 13
C: 13, B
P: 13
C: 13,
P: 13
C: 13,
Phase 2
B B
OKOK
1 2
3
P: 13
C: 13, B
P: 13
C: 13,
P: 13
C: 13, B B
OK
Bob is granted the lock
Paxos Example -
Node Failure
1 2
3
P:
C:
P:
C:
P:
C:
1 2
3
P:
C:
P: 13
C:
P:
C:
Promise (13) ?
Phase 1
B
Incoming request from Bob
Promise (13) ?
1 2
3
P: 13
C:
P: 13
C:
P: 13
C:
Phase 1
B
OK
OK
1 2
3
P: 13
C:
P: 13
C: 13,
P: 13
C:
Phase 2
Commit (13, ) ?B
B
1 2
3
P: 13
C:
P: 13
C: 13,
P: 13
C: 13,
Phase 2
B
B
1 2
3
P: 13
C:
P: 13
C: 13,
P: 13
C: 13,
Alice would also like the lock
B
B
A
1 2
3
P: 13
C:
P: 13
C: 13,
P: 13
C: 13,
Alice would also like the lock
B
B
A
1 2
3
P: 22
C:
P: 13
C: 13,
P: 13
C: 13,
Phase 1
B
B
A
Promise (22) ?
1 2
3
P: 22
C:
P: 13
C: 13,
P: 22
C: 13,
Phase 1
B
B
A
OK(13, )B
1 2
3
P: 22
C: 22,
P: 13
C: 13,
P: 22
C: 13,
Phase 2
B
B
A
Commit (22, ) ?B
B
1 2
3
P: 22
C: 22,
P: 13
C: 13,
P: 22
C: 22,
Phase 2
B
B
OK
B
NO
Paxos Summary
Clients must wait two round trips (2 RTT) to the
majority of nodes. Sometimes longer.
The system will continue as long as a majority of
nodes are up
Multi-Paxos
Lamport’s leader-driven consensus algorithm
Paxos Made Moderately Complex
Robbert van Renesse and Deniz
Altinbuken
ACM Computing Surveys
April 2015
Not the original, but highly recommended
Multi-Paxos
Lamport’s insight:
Phase 1 is not specific to the request so can be done
before the request arrives and can be reused.
Implication:
Bob now only has to wait one RTT
State Machine
Replication
fault-tolerant services using consensus
Implementing Fault-Tolerant
Services Using the State Machine
Approach: A Tutorial
Fred Schneider
ACM Computing Surveys
1990
State Machine Replication
A general technique for making a service, such as a
database, fault-tolerant.
Application
Client Client
Application
Application
Application
Client
Client
Network
Consensus
Consensus
Consensus
Consensus
Consensus
CAP Theorem
You cannot have your cake and eat it
CAP Theorem
Eric Brewer
Presented at Symposium on
Principles of Distributed
Computing, 2000
Consistency, Availability &
Partition Tolerance - Pick Two
1 2
3 4
B C
Paxos Made Live
How google uses Paxos
Paxos Made Live - An Engineering
Perspective
Tushar Chandra, Robert Griesemer
and Joshua Redstone
ACM Symposium on Principles of
Distributed Computing
2007
Paxos Made Live
Paxos made live documents the challenges in
constructing Chubby, a distributed coordination
service, built using Multi-Paxos and SMR.
Isn’t this a solved problem?
“There are significant gaps between the description
of the Paxos algorithm and the needs of a real-world
system.
In order to build a real-world system, an expert needs
to use numerous ideas scattered in the literature and
make several relatively small protocol extensions.
The cumulative effort will be substantial and the final
system will be based on an unproven protocol.”
Challenges
• Handling disk failure and corruption
• Dealing with limited storage capacity
• Effectively handling read-only requests
• Dynamic membership & reconfiguration
• Supporting transactions
• Verifying safety of the implementation
Fast Paxos
Like Multi-Paxos, but faster
Fast Paxos
Leslie Lamport
Microsoft Research Tech Report
MSR-TR-2005-112
Fast Paxos
Paxos: Any node can commit a value in 2 RTTs
Multi-Paxos: The leader node can commit a value in
1 RTT
But, what about any node committing a value in 1
RTT?
Fast Paxos
We can bypass the leader node for many operations,
so any node can commit a value in 1 RTT.
However, we must either:
• reduce the number of failures we guarantee to
tolerance, or
• increase the size of the quorum, or
• a combination of both
Egalitarian Paxos
Don’t restrict yourself unnecessarily
There Is More Consensus in
Egalitarian Parliaments
Iulian Moraru, David G. Andersen,
Michael Kaminsky
SOSP 2013
also see Generalized Consensus and Paxos
Egalitarian Paxos
The basis of SMR is that every replica of an
application receives the same commands in the
same order.
However, sometimes the ordering can be relaxed…
C=1 B? C=C+1 C? B=0 B=C
C=1 B?
C=C+1
C?
B=0
B=C
Partial Ordering
Total Ordering
C=1 B? C=C+1 C? B=0 B=C
Many possible orderings
B? C=C+1 C?B=0 B=CC=1
B?C=C+1 C? B=0 B=CC=1
B? C=C+1 C? B=0 B=CC=1
Egalitarian Paxos
Allow requests to be out-of-order if they are
commutative.
Conflict becomes much less common.
Works well in combination with Fast Paxos.
Viewstamped
Replication Revisited
the forgotten algorithm
Viewstamped Replication Revisited
Barbara Liskov and James
Cowling
MIT Tech Report
MIT-CSAIL-TR-2012-021
Viewstamped Replication
Revisited (VRR)
Interesting and well explained variant of SMR + Multi-
Paxos.
Key features:
• Round robin leader election
• Dynamic Membership
Raft Consensus
Paxos made understandable
In Search of an Understandable
Consensus Algorithm
Diego Ongaro and John
Ousterhout
USENIX Annual Technical
Conference
2014
Raft
Raft has taken the wider community by storm. Due to
its understandable description.
It’s another variant of SMR with Multi-Paxos.
Key features:
• Really strong leadership - all other nodes are
passive
• Dynamic membership and log compaction
Follower Candidate Leader
Startup/
Restart
Timeout Win
Timeout
Step down
Step down
Ios
Why do things yourself, when you can delegate it?
to appear
Ios
The issue with leader-driven algorithms like Multi-
Paxos, Raft and VRR is that throughput is limited to
one node.
Ios allows a leader to safely and dynamically
delegate their responsibilities to other nodes in the
system.
Hydra
consensus for geo-replication
to appear
Hydra
Distributed consensus for systems which span
multiple datacenters.
We use Ios for replication within the datacenter and a
Egalitarian Paxos like protocol for across datacenters.
The system has a clear leader but most requests
simply bypass the leader.
1 2
3
4 5
6
7 8
9
Tokyo
West Coast
East Coast
B
1 2
3
4 5
6
7 8
9
Tokyo
West Coast
East Coast
B
1 2
3
4 5
6
7 8
9
Tokyo
West Coast
East Coast
B
The road we travelled
• 2 impossibility results: CAP & FLP
• 1 replication method: State machine Replication
• 6 consensus algorithms: Paxos, Multi-Paxos, Fast
Paxos, Egalitarian Paxos, Viewstamped Replication
Revisited & Raft
• 2 future algorithms: Ios & Hydra
How strong is the
leadership?
Strong
Leadership Leaderless
Paxos
Egalitarian
Paxos
Raft VRR Ios Hydra
Multi-Paxos Fast Paxos
Leader with
Delegation
Leader only
when needed
Leader driven
Who is the winner?
Depends on the award:
• Best for minimum latency: VRR
• Easier to understand: Raft
• Best for WANs (conflicts rare): Egalitarian Paxos
• Best for WANs (conflicts common): Fast Paxos
Future
1. More algorithms offering a compromise between
strong leadership and leaderless
2. More understandable consensus algorithms
3. Achieving consensus is getting cheaper, even in
challenging settings
4. Deployment with micro-services and unikernels
5. Self-scaling replication - adapting resources to
maintain resilience level.
Stops we drove passed
We have seen one path through history, but many
more exist.
• Alternative replication techniques e.g. chain
replication and primary backup replication
• Alternative failure models e.g. nodes acting
maliciously
• Alternative domains e.g. sensor networks, mobile
networks, between cores
Summary
Do not be discouraged by
impossibility results and dense
abstract academic papers.
Consensus is useful and achievable.
Find the right algorithm for your
specific domain.
heidi.howard@cl.cam.ac.uk
@heidiann360

More Related Content

What's hot

Getting started with quorum -101
Getting started with quorum -101  Getting started with quorum -101
Getting started with quorum -101
Chainstack
 
How do private transactions work on Quorum
How do private transactions work on QuorumHow do private transactions work on Quorum
How do private transactions work on Quorum
Chainstack
 
Overview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus MechanismsOverview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus Mechanisms
Johannes Ahlmann
 
Hyperchains
HyperchainsHyperchains
Hyperchains
Grzegorz Uriasz
 
Defrag X Keynote: Deploying and managing Global Blockchain Network
Defrag X Keynote: Deploying and managing Global Blockchain NetworkDefrag X Keynote: Deploying and managing Global Blockchain Network
Defrag X Keynote: Deploying and managing Global Blockchain Network
Duncan Johnston-Watt
 
Webinar: Enterprise Blockchain Radically Simplified with Truffle and Kaleido
Webinar: Enterprise Blockchain Radically Simplified with Truffle and KaleidoWebinar: Enterprise Blockchain Radically Simplified with Truffle and Kaleido
Webinar: Enterprise Blockchain Radically Simplified with Truffle and Kaleido
Kaleido
 
Consensus Algorithms - Nakov @ jProfessionals - Jan 2018
Consensus Algorithms - Nakov @ jProfessionals - Jan 2018Consensus Algorithms - Nakov @ jProfessionals - Jan 2018
Consensus Algorithms - Nakov @ jProfessionals - Jan 2018
Svetlin Nakov
 
Enterprise Blockchain
Enterprise BlockchainEnterprise Blockchain
Enterprise Blockchain
Aalok Singh
 
Collective Authorities
Collective AuthoritiesCollective Authorities
Collective Authorities
Ismail Khoffi
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchains
Len Bass
 
Vilnius blockchain club 20170413 consensus
Vilnius blockchain club 20170413 consensusVilnius blockchain club 20170413 consensus
Vilnius blockchain club 20170413 consensus
Audrius Ramoska
 
Hyperledger Fabric Application Development 20190618
Hyperledger Fabric Application Development 20190618Hyperledger Fabric Application Development 20190618
Hyperledger Fabric Application Development 20190618
Arnaud Le Hors
 
Introduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart ContractIntroduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart Contract
Thanh Nguyen
 
Hyperledger Fabric
Hyperledger FabricHyperledger Fabric
Hyperledger Fabric
Murughan Palaniachari
 
Ethcon seoul 2019 presentation final
Ethcon seoul 2019 presentation finalEthcon seoul 2019 presentation final
Ethcon seoul 2019 presentation final
Heung-No Lee
 
Blockchain - HyperLedger Fabric
Blockchain - HyperLedger FabricBlockchain - HyperLedger Fabric
Blockchain - HyperLedger Fabric
Araf Karsh Hamid
 
Blockchain, 
Hyperledger fabric & Hyperledger cello
Blockchain, 
Hyperledger fabric & Hyperledger celloBlockchain, 
Hyperledger fabric & Hyperledger cello
Blockchain, 
Hyperledger fabric & Hyperledger cello
Sahdev Zala
 
Introduction to Consensus techniques
Introduction to Consensus techniques Introduction to Consensus techniques
Introduction to Consensus techniques
Vasiliy Suvorov
 
Demystify blockchain development with hyperledger fabric
Demystify blockchain development with hyperledger fabricDemystify blockchain development with hyperledger fabric
Demystify blockchain development with hyperledger fabric
Benjamin Fuentes
 
EUIPO DPM knowledge share: Blockchain and IP
EUIPO DPM knowledge share: Blockchain and IPEUIPO DPM knowledge share: Blockchain and IP
EUIPO DPM knowledge share: Blockchain and IP
Audrius Ramoska
 

What's hot (20)

Getting started with quorum -101
Getting started with quorum -101  Getting started with quorum -101
Getting started with quorum -101
 
How do private transactions work on Quorum
How do private transactions work on QuorumHow do private transactions work on Quorum
How do private transactions work on Quorum
 
Overview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus MechanismsOverview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus Mechanisms
 
Hyperchains
HyperchainsHyperchains
Hyperchains
 
Defrag X Keynote: Deploying and managing Global Blockchain Network
Defrag X Keynote: Deploying and managing Global Blockchain NetworkDefrag X Keynote: Deploying and managing Global Blockchain Network
Defrag X Keynote: Deploying and managing Global Blockchain Network
 
Webinar: Enterprise Blockchain Radically Simplified with Truffle and Kaleido
Webinar: Enterprise Blockchain Radically Simplified with Truffle and KaleidoWebinar: Enterprise Blockchain Radically Simplified with Truffle and Kaleido
Webinar: Enterprise Blockchain Radically Simplified with Truffle and Kaleido
 
Consensus Algorithms - Nakov @ jProfessionals - Jan 2018
Consensus Algorithms - Nakov @ jProfessionals - Jan 2018Consensus Algorithms - Nakov @ jProfessionals - Jan 2018
Consensus Algorithms - Nakov @ jProfessionals - Jan 2018
 
Enterprise Blockchain
Enterprise BlockchainEnterprise Blockchain
Enterprise Blockchain
 
Collective Authorities
Collective AuthoritiesCollective Authorities
Collective Authorities
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchains
 
Vilnius blockchain club 20170413 consensus
Vilnius blockchain club 20170413 consensusVilnius blockchain club 20170413 consensus
Vilnius blockchain club 20170413 consensus
 
Hyperledger Fabric Application Development 20190618
Hyperledger Fabric Application Development 20190618Hyperledger Fabric Application Development 20190618
Hyperledger Fabric Application Development 20190618
 
Introduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart ContractIntroduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart Contract
 
Hyperledger Fabric
Hyperledger FabricHyperledger Fabric
Hyperledger Fabric
 
Ethcon seoul 2019 presentation final
Ethcon seoul 2019 presentation finalEthcon seoul 2019 presentation final
Ethcon seoul 2019 presentation final
 
Blockchain - HyperLedger Fabric
Blockchain - HyperLedger FabricBlockchain - HyperLedger Fabric
Blockchain - HyperLedger Fabric
 
Blockchain, 
Hyperledger fabric & Hyperledger cello
Blockchain, 
Hyperledger fabric & Hyperledger celloBlockchain, 
Hyperledger fabric & Hyperledger cello
Blockchain, 
Hyperledger fabric & Hyperledger cello
 
Introduction to Consensus techniques
Introduction to Consensus techniques Introduction to Consensus techniques
Introduction to Consensus techniques
 
Demystify blockchain development with hyperledger fabric
Demystify blockchain development with hyperledger fabricDemystify blockchain development with hyperledger fabric
Demystify blockchain development with hyperledger fabric
 
EUIPO DPM knowledge share: Blockchain and IP
EUIPO DPM knowledge share: Blockchain and IPEUIPO DPM knowledge share: Blockchain and IP
EUIPO DPM knowledge share: Blockchain and IP
 

Similar to Distributed Consensus: Making Impossible Possible by Heidi howard

Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]
Heidi Howard
 
Beyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusBeyond Off the-Shelf Consensus
Beyond Off the-Shelf Consensus
Rebecca Bilbro
 
Hashgraph as Code
Hashgraph as CodeHashgraph as Code
Hashgraph as Code
Calvin Cheng
 
CAP Theorem and Split Brain Syndrome
CAP Theorem and Split Brain SyndromeCAP Theorem and Split Brain Syndrome
CAP Theorem and Split Brain Syndrome
Dilum Bandara
 
Reaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable worldReaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable world
Heidi Howard
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
Shivji Kumar Jha
 
unit3consesence.pptx
unit3consesence.pptxunit3consesence.pptx
unit3consesence.pptx
GopalSB
 
Crossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCsCrossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCs
Andreas Koschak
 
A quick introduction to Consensus Models
A quick introduction to Consensus ModelsA quick introduction to Consensus Models
A quick introduction to Consensus Models
Oded Noam
 
Towards a low carbon proof-of-work blockchain
Towards a low carbon proof-of-work blockchainTowards a low carbon proof-of-work blockchain
Towards a low carbon proof-of-work blockchain
IJNSA Journal
 
BlockChain_Chap3_RP _Consensus.pptx
BlockChain_Chap3_RP _Consensus.pptxBlockChain_Chap3_RP _Consensus.pptx
BlockChain_Chap3_RP _Consensus.pptx
RajChauhan226834
 
Building on quicksand microservices indicthreads
Building on quicksand microservices  indicthreadsBuilding on quicksand microservices  indicthreads
Building on quicksand microservices indicthreads
IndicThreads
 
V SYSTEMS - SPoS Whitepaper_EN
V SYSTEMS - SPoS Whitepaper_ENV SYSTEMS - SPoS Whitepaper_EN
V SYSTEMS - SPoS Whitepaper_EN
V SYSTEMS
 
The need for blockchain technology
The need for blockchain technologyThe need for blockchain technology
The need for blockchain technology
Tommy Koens
 
A Tale of Contemporary Software
A Tale of Contemporary SoftwareA Tale of Contemporary Software
A Tale of Contemporary Software
Yun Zhi Lin
 
01 what is blockchain
01 what is blockchain01 what is blockchain
01 what is blockchain
BastianBlankenburg
 
Cs704 d distributedmutualexcclusion&memory
Cs704 d distributedmutualexcclusion&memoryCs704 d distributedmutualexcclusion&memory
Cs704 d distributedmutualexcclusion&memory
Debasis Das
 
FIWARE Data Management in High Availability
FIWARE Data Management in High AvailabilityFIWARE Data Management in High Availability
FIWARE Data Management in High Availability
Federico Michele Facca
 

Similar to Distributed Consensus: Making Impossible Possible by Heidi howard (20)

Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]
 
Beyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusBeyond Off the-Shelf Consensus
Beyond Off the-Shelf Consensus
 
9X5u87KWa267pP7aGX3K
9X5u87KWa267pP7aGX3K9X5u87KWa267pP7aGX3K
9X5u87KWa267pP7aGX3K
 
Hashgraph as Code
Hashgraph as CodeHashgraph as Code
Hashgraph as Code
 
CAP Theorem and Split Brain Syndrome
CAP Theorem and Split Brain SyndromeCAP Theorem and Split Brain Syndrome
CAP Theorem and Split Brain Syndrome
 
Reaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable worldReaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable world
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
unit3consesence.pptx
unit3consesence.pptxunit3consesence.pptx
unit3consesence.pptx
 
Crossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCsCrossing the Boundaries: Development Strategies for (P)SoCs
Crossing the Boundaries: Development Strategies for (P)SoCs
 
A quick introduction to Consensus Models
A quick introduction to Consensus ModelsA quick introduction to Consensus Models
A quick introduction to Consensus Models
 
Towards a low carbon proof-of-work blockchain
Towards a low carbon proof-of-work blockchainTowards a low carbon proof-of-work blockchain
Towards a low carbon proof-of-work blockchain
 
BlockChain_Chap3_RP _Consensus.pptx
BlockChain_Chap3_RP _Consensus.pptxBlockChain_Chap3_RP _Consensus.pptx
BlockChain_Chap3_RP _Consensus.pptx
 
Building on quicksand microservices indicthreads
Building on quicksand microservices  indicthreadsBuilding on quicksand microservices  indicthreads
Building on quicksand microservices indicthreads
 
V SYSTEMS - SPoS Whitepaper_EN
V SYSTEMS - SPoS Whitepaper_ENV SYSTEMS - SPoS Whitepaper_EN
V SYSTEMS - SPoS Whitepaper_EN
 
The need for blockchain technology
The need for blockchain technologyThe need for blockchain technology
The need for blockchain technology
 
Cap in depth
Cap in depthCap in depth
Cap in depth
 
A Tale of Contemporary Software
A Tale of Contemporary SoftwareA Tale of Contemporary Software
A Tale of Contemporary Software
 
01 what is blockchain
01 what is blockchain01 what is blockchain
01 what is blockchain
 
Cs704 d distributedmutualexcclusion&memory
Cs704 d distributedmutualexcclusion&memoryCs704 d distributedmutualexcclusion&memory
Cs704 d distributedmutualexcclusion&memory
 
FIWARE Data Management in High Availability
FIWARE Data Management in High AvailabilityFIWARE Data Management in High Availability
FIWARE Data Management in High Availability
 

More from J On The Beach

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
J On The Beach
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
J On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
J On The Beach
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
J On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
J On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
J On The Beach
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
J On The Beach
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
J On The Beach
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
J On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
J On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
J On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
J On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
J On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
J On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
J On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
J On The Beach
 

More from J On The Beach (20)

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 

Recently uploaded

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 

Recently uploaded (20)

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 

Distributed Consensus: Making Impossible Possible by Heidi howard

  • 1. Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360
  • 2.
  • 3. What is Consensus? “The process by which we reach agreement over system state between unreliable machines connected by asynchronous networks”
  • 4. Why? • Distributed locking • Banking • Safety critical systems • Distributed scheduling and coordination Anything which requires guaranteed agreement
  • 5. A walk through history We are going to take a journey through the developments in distributed consensus, spanning 3 decades. We are going to search for answers to questions like: • how do we reach consensus? • what is the best method for reaching consensus? • can we even reach consensus? • what’s next in the field?
  • 6. FLP Result off to a slippery start Impossibility of distributed consensus with one faulty process Michael Fischer, Nancy Lynch and Michael Paterson ACM SIGACT-SIGMOD Symposium on Principles of Database Systems 1983
  • 7. FLP We cannot guarantee agreement in an asynchronous system where even one host might fail. Why? We cannot reliably detect failures. We cannot know for sure the difference between a slow host/network and a failed host NB: We can still guarantee safety, the issue limited to guaranteeing liveness.
  • 8. Solution to FLP In practice: We accept that sometimes the system will not be available. We mitigate this using timers and backoffs. In theory: We make weaker assumptions about the synchrony of the system e.g. messages arrive within a year.
  • 9. Paxos Lamport’s original consensus algorithm The Part-Time Parliament Leslie Lamport ACM Transactions on Computer Systems May 1998
  • 10. Paxos The original consensus algorithm for reaching agreement on a single value. • two phase process: promise and commit • majority agreement • monotonically increasing numbers
  • 14. 1 2 3 P: C: P: 13 C: P: C: B Promise (13) ? Phase 1 Promise (13) ?
  • 15. 1 2 3 P: 13 C: OKOK P: 13 C: P: 13 C: Phase 1
  • 16. 1 2 3 P: 13 C: 13, B P: 13 C: P: 13 C: Phase 2 Commit (13, ) ?B Commit (13, ) ?B
  • 17. 1 2 3 P: 13 C: 13, B P: 13 C: 13, P: 13 C: 13, Phase 2 B B OKOK
  • 18. 1 2 3 P: 13 C: 13, B P: 13 C: 13, P: 13 C: 13, B B OK Bob is granted the lock
  • 21. 1 2 3 P: C: P: 13 C: P: C: Promise (13) ? Phase 1 B Incoming request from Bob Promise (13) ?
  • 22. 1 2 3 P: 13 C: P: 13 C: P: 13 C: Phase 1 B OK OK
  • 23. 1 2 3 P: 13 C: P: 13 C: 13, P: 13 C: Phase 2 Commit (13, ) ?B B
  • 24. 1 2 3 P: 13 C: P: 13 C: 13, P: 13 C: 13, Phase 2 B B
  • 25. 1 2 3 P: 13 C: P: 13 C: 13, P: 13 C: 13, Alice would also like the lock B B A
  • 26. 1 2 3 P: 13 C: P: 13 C: 13, P: 13 C: 13, Alice would also like the lock B B A
  • 27. 1 2 3 P: 22 C: P: 13 C: 13, P: 13 C: 13, Phase 1 B B A Promise (22) ?
  • 28. 1 2 3 P: 22 C: P: 13 C: 13, P: 22 C: 13, Phase 1 B B A OK(13, )B
  • 29. 1 2 3 P: 22 C: 22, P: 13 C: 13, P: 22 C: 13, Phase 2 B B A Commit (22, ) ?B B
  • 30. 1 2 3 P: 22 C: 22, P: 13 C: 13, P: 22 C: 22, Phase 2 B B OK B NO
  • 31. Paxos Summary Clients must wait two round trips (2 RTT) to the majority of nodes. Sometimes longer. The system will continue as long as a majority of nodes are up
  • 32. Multi-Paxos Lamport’s leader-driven consensus algorithm Paxos Made Moderately Complex Robbert van Renesse and Deniz Altinbuken ACM Computing Surveys April 2015 Not the original, but highly recommended
  • 33. Multi-Paxos Lamport’s insight: Phase 1 is not specific to the request so can be done before the request arrives and can be reused. Implication: Bob now only has to wait one RTT
  • 34. State Machine Replication fault-tolerant services using consensus Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred Schneider ACM Computing Surveys 1990
  • 35. State Machine Replication A general technique for making a service, such as a database, fault-tolerant. Application Client Client
  • 36.
  • 38.
  • 39. CAP Theorem You cannot have your cake and eat it CAP Theorem Eric Brewer Presented at Symposium on Principles of Distributed Computing, 2000
  • 40. Consistency, Availability & Partition Tolerance - Pick Two 1 2 3 4 B C
  • 41. Paxos Made Live How google uses Paxos Paxos Made Live - An Engineering Perspective Tushar Chandra, Robert Griesemer and Joshua Redstone ACM Symposium on Principles of Distributed Computing 2007
  • 42. Paxos Made Live Paxos made live documents the challenges in constructing Chubby, a distributed coordination service, built using Multi-Paxos and SMR.
  • 43. Isn’t this a solved problem? “There are significant gaps between the description of the Paxos algorithm and the needs of a real-world system. In order to build a real-world system, an expert needs to use numerous ideas scattered in the literature and make several relatively small protocol extensions. The cumulative effort will be substantial and the final system will be based on an unproven protocol.”
  • 44. Challenges • Handling disk failure and corruption • Dealing with limited storage capacity • Effectively handling read-only requests • Dynamic membership & reconfiguration • Supporting transactions • Verifying safety of the implementation
  • 45. Fast Paxos Like Multi-Paxos, but faster Fast Paxos Leslie Lamport Microsoft Research Tech Report MSR-TR-2005-112
  • 46. Fast Paxos Paxos: Any node can commit a value in 2 RTTs Multi-Paxos: The leader node can commit a value in 1 RTT But, what about any node committing a value in 1 RTT?
  • 47. Fast Paxos We can bypass the leader node for many operations, so any node can commit a value in 1 RTT. However, we must either: • reduce the number of failures we guarantee to tolerance, or • increase the size of the quorum, or • a combination of both
  • 48. Egalitarian Paxos Don’t restrict yourself unnecessarily There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David G. Andersen, Michael Kaminsky SOSP 2013 also see Generalized Consensus and Paxos
  • 49. Egalitarian Paxos The basis of SMR is that every replica of an application receives the same commands in the same order. However, sometimes the ordering can be relaxed…
  • 50. C=1 B? C=C+1 C? B=0 B=C C=1 B? C=C+1 C? B=0 B=C Partial Ordering Total Ordering
  • 51. C=1 B? C=C+1 C? B=0 B=C Many possible orderings B? C=C+1 C?B=0 B=CC=1 B?C=C+1 C? B=0 B=CC=1 B? C=C+1 C? B=0 B=CC=1
  • 52. Egalitarian Paxos Allow requests to be out-of-order if they are commutative. Conflict becomes much less common. Works well in combination with Fast Paxos.
  • 53. Viewstamped Replication Revisited the forgotten algorithm Viewstamped Replication Revisited Barbara Liskov and James Cowling MIT Tech Report MIT-CSAIL-TR-2012-021
  • 54. Viewstamped Replication Revisited (VRR) Interesting and well explained variant of SMR + Multi- Paxos. Key features: • Round robin leader election • Dynamic Membership
  • 55. Raft Consensus Paxos made understandable In Search of an Understandable Consensus Algorithm Diego Ongaro and John Ousterhout USENIX Annual Technical Conference 2014
  • 56. Raft Raft has taken the wider community by storm. Due to its understandable description. It’s another variant of SMR with Multi-Paxos. Key features: • Really strong leadership - all other nodes are passive • Dynamic membership and log compaction
  • 57. Follower Candidate Leader Startup/ Restart Timeout Win Timeout Step down Step down
  • 58. Ios Why do things yourself, when you can delegate it? to appear
  • 59. Ios The issue with leader-driven algorithms like Multi- Paxos, Raft and VRR is that throughput is limited to one node. Ios allows a leader to safely and dynamically delegate their responsibilities to other nodes in the system.
  • 61. Hydra Distributed consensus for systems which span multiple datacenters. We use Ios for replication within the datacenter and a Egalitarian Paxos like protocol for across datacenters. The system has a clear leader but most requests simply bypass the leader.
  • 62. 1 2 3 4 5 6 7 8 9 Tokyo West Coast East Coast B
  • 63. 1 2 3 4 5 6 7 8 9 Tokyo West Coast East Coast B
  • 64. 1 2 3 4 5 6 7 8 9 Tokyo West Coast East Coast B
  • 65. The road we travelled • 2 impossibility results: CAP & FLP • 1 replication method: State machine Replication • 6 consensus algorithms: Paxos, Multi-Paxos, Fast Paxos, Egalitarian Paxos, Viewstamped Replication Revisited & Raft • 2 future algorithms: Ios & Hydra
  • 66. How strong is the leadership? Strong Leadership Leaderless Paxos Egalitarian Paxos Raft VRR Ios Hydra Multi-Paxos Fast Paxos Leader with Delegation Leader only when needed Leader driven
  • 67. Who is the winner? Depends on the award: • Best for minimum latency: VRR • Easier to understand: Raft • Best for WANs (conflicts rare): Egalitarian Paxos • Best for WANs (conflicts common): Fast Paxos
  • 68. Future 1. More algorithms offering a compromise between strong leadership and leaderless 2. More understandable consensus algorithms 3. Achieving consensus is getting cheaper, even in challenging settings 4. Deployment with micro-services and unikernels 5. Self-scaling replication - adapting resources to maintain resilience level.
  • 69. Stops we drove passed We have seen one path through history, but many more exist. • Alternative replication techniques e.g. chain replication and primary backup replication • Alternative failure models e.g. nodes acting maliciously • Alternative domains e.g. sensor networks, mobile networks, between cores
  • 70. Summary Do not be discouraged by impossibility results and dense abstract academic papers. Consensus is useful and achievable. Find the right algorithm for your specific domain. heidi.howard@cl.cam.ac.uk @heidiann360