Distributed consensus algorithms allow different computer systems to agree on data values in an unreliable distributed network. The document summarizes the history and developments of consensus algorithms over three decades, including early impossibility results, Paxos, Multi-Paxos, Fast Paxos, Egalitarian Paxos, and Raft. It also discusses challenges in implementing consensus algorithms and potential future directions like Ios and Hydra that improve throughput and allow consensus across geographically distributed data centers.
Reaching reliable agreement in an unreliable worldHeidi Howard
In this lecture, we explore how to construct resilient distributed systems on top of unreliable components. Starting, almost two decades ago, with Leslie Lamport’s work on organising parliament for a Greek island. We will take a journey to today’s datacenter and the systems powering companies like Google, Amazon and Microsoft. Along the way, we will face interesting impossibility results, machines acting maliciously and the complexity to today’s networks. We will discover how to reach agreement between many parties and from this, how we construct the fault-tolerance systems that we depend upon everyday.
This lecture was given on October 13th 2015 at the University of Cambridge, as part of the Research Students Lecture Series.
Flexible Paxos: Reaching agreement without majorities Heidi Howard
The Paxos algorithm is a widely adopted approach to achieving distributed consensus. Over decades it has been extensively researched, optimized and deployed in popular systems such as Raft, Zookeeper and Chubby. At its foundation, Paxos uses two phases, each requiring agreement from a majority of participants (known as quorums) to reliably reach consensus. In this talk, I will share the simple yet powerful result that each of the phases of Paxos may use non-intersecting quorums. This means that majorities are no longer necessary and that Paxos is a single point on a broad spectrum of possibilities for safely reaching consensus. This result therefore opens the door for a new breed of scalable and resilient consensus algorithms for performant production system.
Distributed Consensus: Making Impossible Possible [Revised]Heidi Howard
In this talk, we explore how to construct resilient distributed systems on top of unreliable components. Starting, almost two decades ago, with Leslie Lamport’s work on organising parliament for a Greek island. We will take a journey to today’s datacenters and the systems powering companies like Google, Amazon and Microsoft. Along the way, we will face interesting impossibility results, machines acting maliciously and the complexity of today’s networks. Ultimately, we will discover how to reach agreement between many parties and from this, how to construct new fault-tolerance systems that we can depend upon everyday.
Distributed Consensus: Making the Impossible PossibleC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/29faOI5.
Heidi Howard explores how to construct resilient distributed systems on top of unreliable components. Howard discusses today’s datacenters and the systems powering companies like Google, Amazon and Microsoft, as well as which algorithms are best suited to different situations. Filmed at qconlondon.com.
Heidi Howard is currently studying towards PhD at the Cambridge University, Computer Lab. Her research interest is fault-tolerance, consistency and consensus in modern distributed systems. Heidi has also previously worked as research assistant and undergraduate researcher on topics such as middlebox traversal, DNS, privacy preserving systems and wireless community networks.
There have never been more commercial tools available for building distributed data apps — from cloud hosting services, to cloud-native databases, to cloud-based analytics platforms. So why is it still so hard to make a successful app with a global user base?
One of the toughest challenges cloud offerings take on is the problem of consensus, abstracting away most of the complexity. That's no small feat, given that this is a hard enough problem that people spend years getting a PhD just to understand it! Unfortunately, while buying off-the-shelf cloud services can accelerate the path to an MVP, it also makes optimization tough. How will we scale during a period of rapid user growth? How do we do I18n and l10n or guarantee a good UX for users on the other side of the world? How do we prevent replication that might get us into legal trouble?
In this talk, we'll consider several case studies of global apps (both successful and otherwise!), talk about the limitations of off-the-shelf consensus, and consider a future where everyday developers can use open source tools to build distributed data apps that are easier to reason about, maintain, and tune.
Reaching reliable agreement in an unreliable worldHeidi Howard
In this lecture, we explore how to construct resilient distributed systems on top of unreliable components. Starting, almost two decades ago, with Leslie Lamport’s work on organising parliament for a Greek island. We will take a journey to today’s datacenter and the systems powering companies like Google, Amazon and Microsoft. Along the way, we will face interesting impossibility results, machines acting maliciously and the complexity to today’s networks. We will discover how to reach agreement between many parties and from this, how we construct the fault-tolerance systems that we depend upon everyday.
This lecture was given on October 13th 2015 at the University of Cambridge, as part of the Research Students Lecture Series.
Flexible Paxos: Reaching agreement without majorities Heidi Howard
The Paxos algorithm is a widely adopted approach to achieving distributed consensus. Over decades it has been extensively researched, optimized and deployed in popular systems such as Raft, Zookeeper and Chubby. At its foundation, Paxos uses two phases, each requiring agreement from a majority of participants (known as quorums) to reliably reach consensus. In this talk, I will share the simple yet powerful result that each of the phases of Paxos may use non-intersecting quorums. This means that majorities are no longer necessary and that Paxos is a single point on a broad spectrum of possibilities for safely reaching consensus. This result therefore opens the door for a new breed of scalable and resilient consensus algorithms for performant production system.
Distributed Consensus: Making Impossible Possible [Revised]Heidi Howard
In this talk, we explore how to construct resilient distributed systems on top of unreliable components. Starting, almost two decades ago, with Leslie Lamport’s work on organising parliament for a Greek island. We will take a journey to today’s datacenters and the systems powering companies like Google, Amazon and Microsoft. Along the way, we will face interesting impossibility results, machines acting maliciously and the complexity of today’s networks. Ultimately, we will discover how to reach agreement between many parties and from this, how to construct new fault-tolerance systems that we can depend upon everyday.
Distributed Consensus: Making the Impossible PossibleC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/29faOI5.
Heidi Howard explores how to construct resilient distributed systems on top of unreliable components. Howard discusses today’s datacenters and the systems powering companies like Google, Amazon and Microsoft, as well as which algorithms are best suited to different situations. Filmed at qconlondon.com.
Heidi Howard is currently studying towards PhD at the Cambridge University, Computer Lab. Her research interest is fault-tolerance, consistency and consensus in modern distributed systems. Heidi has also previously worked as research assistant and undergraduate researcher on topics such as middlebox traversal, DNS, privacy preserving systems and wireless community networks.
There have never been more commercial tools available for building distributed data apps — from cloud hosting services, to cloud-native databases, to cloud-based analytics platforms. So why is it still so hard to make a successful app with a global user base?
One of the toughest challenges cloud offerings take on is the problem of consensus, abstracting away most of the complexity. That's no small feat, given that this is a hard enough problem that people spend years getting a PhD just to understand it! Unfortunately, while buying off-the-shelf cloud services can accelerate the path to an MVP, it also makes optimization tough. How will we scale during a period of rapid user growth? How do we do I18n and l10n or guarantee a good UX for users on the other side of the world? How do we prevent replication that might get us into legal trouble?
In this talk, we'll consider several case studies of global apps (both successful and otherwise!), talk about the limitations of off-the-shelf consensus, and consider a future where everyday developers can use open source tools to build distributed data apps that are easier to reason about, maintain, and tune.
Shared on 5th Dec at SGInnovate with Swirlds Mance Harmon, Jordan Fried and Edgar Seah.
Hashgraph consensus, demo apps in Swirlds Java SDK, babble (unofficial golang implementation of Hashgraph) and their implications for distributed ledger technology.
In this slide deck, we go exploring the database landscape today and the common lego blocks that are used to build these different falvours of databses. We will dive through internals of a database, explore some choices and towards the end also explore some real world database architectures in view of the concepts (legos) we explored earlier.
Crossing the Boundaries: Development Strategies for (P)SoCsAndreas Koschak
Not many years ago, FPGAs designed to be used in quantities contained a number of logic cells that were easy to fill using pure RTL or even schematic design tools. Now the boundaries between hardware and software have started to blur. With the advent of large scale FPGAs, structured ASICs and programmable SoCs, the massively increased complexity requires higher level design tools and better controlled development strategies, such as the ones used in ASIC design. Other approaches can be taken from the software development world. This talk highlights some strategies that allow designers to increase the quality of their designs without high costs and still be technology neutral and flexible during the course of the project.
Towards a low carbon proof-of-work blockchainIJNSA Journal
Proof of Work (PoW) blockchains burn a lot of energy. Proof-of-work algorithms are expensive by design and often only serve to compute blockchains. In some sense, carbon-based and non-carbon based regional electric power is fungible. So the total carbon and non-carbon electric power mix plays a role. Thus, generally PoW algorithms have large CO2 footprints solely for computing blockchains. A proof of technology is described towards replacing hashcash or other PoW methods with a lottery and proof-of-VM (PoVM) emulation. PoVM emulation is a form of PoW where an autonomous blockchain miner gets a lottery ticket in exchange for providing a VM (virtual Machine) for a specified period. These VMs get their jobs from a job queue. Managing and ensuring, by concensus, that autonomous PoVMs are properly configured and running as expected gives several gaps for a complete practical system. These gaps are discussed. Our system is similar to a number of other blockchain systems. We briefly survey these systems. This paper along with our proof of technology was done as a senior design project.
This presentation overviews basic principles of high availability architectures and presents how to deploy in high availability FIWARE data management services.
Cryptomania! The Byzantine Generals Leave Their GardenDallas Kennedy
May 2019 ● Last part of a survey of distributed consensus, proof of work versus proof of stake, the Byzantine Generals Problem, and steps beyond block chains towards more comprehensive network solutions
For further details contact:
N.RAJASEKARAN B.E M.S 9841091117,9840103301.
IMPULSE TECHNOLOGIES,
Old No 251, New No 304,
2nd Floor,
Arcot road ,
Vadapalani ,
Chennai-26.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2BhEnpC.
Nathan Sobo talks about a new library called Tachyon that draws from the latest CRDT research to enable real-time collaborative text editing in a fully distributed setting. Filmed at qconsf.com.
Nathan Sobo is a founding member of the Atom Editor team at GitHub.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
More Related Content
Similar to Distributed Consensus: Making Impossible Possible
Shared on 5th Dec at SGInnovate with Swirlds Mance Harmon, Jordan Fried and Edgar Seah.
Hashgraph consensus, demo apps in Swirlds Java SDK, babble (unofficial golang implementation of Hashgraph) and their implications for distributed ledger technology.
In this slide deck, we go exploring the database landscape today and the common lego blocks that are used to build these different falvours of databses. We will dive through internals of a database, explore some choices and towards the end also explore some real world database architectures in view of the concepts (legos) we explored earlier.
Crossing the Boundaries: Development Strategies for (P)SoCsAndreas Koschak
Not many years ago, FPGAs designed to be used in quantities contained a number of logic cells that were easy to fill using pure RTL or even schematic design tools. Now the boundaries between hardware and software have started to blur. With the advent of large scale FPGAs, structured ASICs and programmable SoCs, the massively increased complexity requires higher level design tools and better controlled development strategies, such as the ones used in ASIC design. Other approaches can be taken from the software development world. This talk highlights some strategies that allow designers to increase the quality of their designs without high costs and still be technology neutral and flexible during the course of the project.
Towards a low carbon proof-of-work blockchainIJNSA Journal
Proof of Work (PoW) blockchains burn a lot of energy. Proof-of-work algorithms are expensive by design and often only serve to compute blockchains. In some sense, carbon-based and non-carbon based regional electric power is fungible. So the total carbon and non-carbon electric power mix plays a role. Thus, generally PoW algorithms have large CO2 footprints solely for computing blockchains. A proof of technology is described towards replacing hashcash or other PoW methods with a lottery and proof-of-VM (PoVM) emulation. PoVM emulation is a form of PoW where an autonomous blockchain miner gets a lottery ticket in exchange for providing a VM (virtual Machine) for a specified period. These VMs get their jobs from a job queue. Managing and ensuring, by concensus, that autonomous PoVMs are properly configured and running as expected gives several gaps for a complete practical system. These gaps are discussed. Our system is similar to a number of other blockchain systems. We briefly survey these systems. This paper along with our proof of technology was done as a senior design project.
This presentation overviews basic principles of high availability architectures and presents how to deploy in high availability FIWARE data management services.
Cryptomania! The Byzantine Generals Leave Their GardenDallas Kennedy
May 2019 ● Last part of a survey of distributed consensus, proof of work versus proof of stake, the Byzantine Generals Problem, and steps beyond block chains towards more comprehensive network solutions
For further details contact:
N.RAJASEKARAN B.E M.S 9841091117,9840103301.
IMPULSE TECHNOLOGIES,
Old No 251, New No 304,
2nd Floor,
Arcot road ,
Vadapalani ,
Chennai-26.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2BhEnpC.
Nathan Sobo talks about a new library called Tachyon that draws from the latest CRDT research to enable real-time collaborative text editing in a fully distributed setting. Filmed at qconsf.com.
Nathan Sobo is a founding member of the Atom Editor team at GitHub.
Similar to Distributed Consensus: Making Impossible Possible (20)
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
3. What is Consensus?
“The process by which we reach agreement over
system state between unreliable machines connected
by asynchronous networks”
4. Why?
• Distributed locking
• Banking
• Safety critical systems
• Distributed scheduling and coordination
Anything which requires guaranteed agreement
5. A walk through history
We are going to take a journey through the
developments in distributed consensus, spanning 3
decades.
We are going to search for answers to questions like:
• how do we reach consensus?
• what is the best method for reaching consensus?
• can we even reach consensus?
• what’s next in the field?
6. FLP Result
off to a slippery start
Impossibility of distributed
consensus with one faulty process
Michael Fischer, Nancy Lynch
and Michael Paterson
ACM SIGACT-SIGMOD
Symposium on Principles of
Database Systems
1983
7. FLP
We cannot guarantee agreement in an asynchronous
system where even one host might fail.
Why?
We cannot reliably detect failures. We cannot know
for sure the difference between a slow host/network
and a failed host
NB: We can still guarantee safety, the issue limited to
guaranteeing liveness.
8. Solution to FLP
In practice:
We accept that sometimes the system will not be
available. We mitigate this using timers and backoffs.
In theory:
We make weaker assumptions about the synchrony
of the system e.g. messages arrive within a year.
10. Paxos
The original consensus algorithm for reaching
agreement on a single value.
• two phase process: promise and commit
• majority agreement
• monotonically increasing numbers
29. 1 2
3
P: 22
C: 22,
P: 13
C: 13,
P: 22
C: 13,
Phase 2
B
B
A
Commit (22, ) ?B
B
30. 1 2
3
P: 22
C: 22,
P: 13
C: 13,
P: 22
C: 22,
Phase 2
B
B
OK
B
NO
31. Paxos Summary
Clients must wait two round trips (2 RTT) to the
majority of nodes. Sometimes longer.
The system will continue as long as a majority of
nodes are up
32. Multi-Paxos
Lamport’s leader-driven consensus algorithm
Paxos Made Moderately Complex
Robbert van Renesse and Deniz
Altinbuken
ACM Computing Surveys
April 2015
Not the original, but highly recommended
33. Multi-Paxos
Lamport’s insight:
Phase 1 is not specific to the request so can be done
before the request arrives and can be reused.
Implication:
Bob now only has to wait one RTT
41. Paxos Made Live
How google uses Paxos
Paxos Made Live - An Engineering
Perspective
Tushar Chandra, Robert Griesemer
and Joshua Redstone
ACM Symposium on Principles of
Distributed Computing
2007
42. Paxos Made Live
Paxos made live documents the challenges in
constructing Chubby, a distributed coordination
service, built using Multi-Paxos and SMR.
43. Isn’t this a solved problem?
“There are significant gaps between the description
of the Paxos algorithm and the needs of a real-world
system.
In order to build a real-world system, an expert needs
to use numerous ideas scattered in the literature and
make several relatively small protocol extensions.
The cumulative effort will be substantial and the final
system will be based on an unproven protocol.”
44. Challenges
• Handling disk failure and corruption
• Dealing with limited storage capacity
• Effectively handling read-only requests
• Dynamic membership & reconfiguration
• Supporting transactions
• Verifying safety of the implementation
45. Fast Paxos
Like Multi-Paxos, but faster
Fast Paxos
Leslie Lamport
Microsoft Research Tech Report
MSR-TR-2005-112
46. Fast Paxos
Paxos: Any node can commit a value in 2 RTTs
Multi-Paxos: The leader node can commit a value in
1 RTT
But, what about any node committing a value in 1
RTT?
47. Fast Paxos
We can bypass the leader node for many operations,
so any node can commit a value in 1 RTT.
However, we must either:
• reduce the number of failures we guarantee to
tolerance, or
• increase the size of the quorum, or
• a combination of both
48. Egalitarian Paxos
Don’t restrict yourself unnecessarily
There Is More Consensus in
Egalitarian Parliaments
Iulian Moraru, David G. Andersen,
Michael Kaminsky
SOSP 2013
also see Generalized Consensus and Paxos
49. Egalitarian Paxos
The basis of SMR is that every replica of an
application receives the same commands in the
same order.
However, sometimes the ordering can be relaxed…
52. Egalitarian Paxos
Allow requests to be out-of-order if they are
commutative.
Conflict becomes much less common.
Works well in combination with Fast Paxos.
55. Raft Consensus
Paxos made understandable
In Search of an Understandable
Consensus Algorithm
Diego Ongaro and John
Ousterhout
USENIX Annual Technical
Conference
2014
56. Raft
Raft has taken the wider community by storm. Due to
its understandable description.
It’s another variant of SMR with Multi-Paxos.
Key features:
• Really strong leadership - all other nodes are
passive
• Dynamic membership and log compaction
59. Ios
The issue with leader-driven algorithms like Multi-
Paxos, Raft and VRR is that throughput is limited to
one node.
Ios allows a leader to safely and dynamically
delegate their responsibilities to other nodes in the
system.
61. Hydra
Distributed consensus for systems which span
multiple datacenters.
We use Ios for replication within the datacenter and a
Egalitarian Paxos like protocol for across datacenters.
The system has a clear leader but most requests
simply bypass the leader.
65. The road we travelled
• 2 impossibility results: CAP & FLP
• 1 replication method: State machine Replication
• 6 consensus algorithms: Paxos, Multi-Paxos, Fast
Paxos, Egalitarian Paxos, Viewstamped Replication
Revisited & Raft
• 2 future algorithms: Ios & Hydra
66. How strong is the
leadership?
Strong
Leadership Leaderless
Paxos
Egalitarian
Paxos
Raft VRR Ios Hydra
Multi-Paxos Fast Paxos
Leader with
Delegation
Leader only
when needed
Leader driven
67. Who is the winner?
Depends on the award:
• Best for minimum latency: VRR
• Easier to understand: Raft
• Best for WANs (conflicts rare): Egalitarian Paxos
• Best for WANs (conflicts common): Fast Paxos
68. Future
1. More algorithms offering a compromise between
strong leadership and leaderless
2. More understandable consensus algorithms
3. Achieving consensus is getting cheaper, even in
challenging settings
4. Deployment with micro-services and unikernels
5. Self-scaling replication - adapting resources to
maintain resilience level.
69. Stops we drove passed
We have seen one path through history, but many
more exist.
• Alternative replication techniques e.g. chain
replication and primary backup replication
• Alternative failure models e.g. nodes acting
maliciously
• Alternative domains e.g. sensor networks, mobile
networks, between cores
70. Summary
Do not be discouraged by
impossibility results and dense
abstract academic papers.
Consensus is useful and achievable.
Find the right algorithm for your
specific domain.
heidi.howard@cl.cam.ac.uk
@heidiann360