SlideShare a Scribd company logo
Raft
Understandable Consensus Algorithm
Presentation for Seminar
Muayyad Saleh Alsadi
(20158043)
Based on Raft Paper
“In Search of an Understandable
Consensus Algorithm”
Diego Ongaro and John Ousterhout, Stanford
University
Raft
Introduction
Introduction
Raft is a consensus algorithm for managing a
replicated log. It produces a result equivalent
to multi-Paxos, and it is as efficient as Paxos
but with completely different structure meant
to be understandable as the most key
objective.
Raft
Introduction to consensus algorithms
What is a consensus
algorithm?
“Consensus algorithms allow a collection of
machines to work as a coherent group that
can survive the failures of some of its
members. Because of this, they play a
key role in building reliable large-scale
software systems.”
Raft
Introduction to consensus algorithms
Replicated State
Machine
a collection of servers
compute identical copies
of the same state and can
continue operating even if
some of the servers are
down. Replicated state
machines are used to
solve a variety of fault
tolerance problems in
distributed systems.
Raft
Introduction to consensus algorithms
Why not Paxos?
“There are significant gaps between the
description of the Paxos algorithm and the
needs of a real-world system… the final
system will be based on an unproven
protocol”
Quoted from Google's Chubby implementers
Raft
Introduction to consensus algorithms
Why not Paxos?
#1 Single-decree Paxos is dense and subtle:
it is divided into two stages that do not have
simple intuitive explanations and cannot be
understood independently. Because of this, it
is difficult to develop intuitions about why the
single-decree protocol works. The
composition rules for multi-Paxos add
significant additional complexity and subtlety.
Raft
Introduction to consensus algorithms
Why not Paxos?
#2 it does not provide a good foundation for
building practical implementations. One
reason is that there is no widely agreed-upon
algorithm for multi-Paxos. Lamport’s
descriptions are mostly about single-decree
Paxos; he sketched possible approaches to
multi-Paxos, but many details are missing.
Raft
Properties and key features
Novel features of Raft:
● Strong leader: Log entries only flow from the leader to
other servers. This simplifies the management of the
replicated log and makes Raft easier to understand.
● Leader election: beside heartbeats required in any
consensus algorithm, Raft uses randomized timers to elect
leaders. This helps resolving conflicts simply and rapidly.
● Membership changes: changing the set of servers in the
cluster uses a new joint consensus approach where the
majorities of two different configurations overlap during
transitions. This allows the cluster to continue operating
normally during configuration changes.
Raft
Properties and key features
consensus algorithm properties
● Safety: never returning an incorrect result in any condition
despite network delays, partitions, and packet loss,
duplication, and reorder
● Fault-tolerant: the system would be available and fully-
functional in case of failure of some nodes.
● Does not depend on time consistency:
● Performance is not affected by minority of slow nodes
Raft
Sub-problems
● Leader election: system should continue to work even if
its leader fails, by electing a new leader
● Log replication: leader takes operations from clients and
replicate them into all nodes. Forcing them all to agree.
● Safety: assert that the 5 guarantees (next slide) are all
satisfied at any time
Raft
The 5 guarantees
1)Election Safety: at most one leader can be elected in a
given term.
2)Leader Append-Only: a leader never overwrites or
deletes entries in its log; it only appends new entries.
3)Log Matching: if two logs contain an entry with the same
index and term, then the logs are identical in all entries up
through the given index.
4)Leader Completeness: if a log entry is committed in a
given term, then that entry will be present in the logs of
the leaders for all higher-numbered terms.
5)State Machine Safety: if a server has applied a log
entry at a given index to its state machine, no other server
will ever apply a different log entry for the same index.
Raft
The Basics: The 3 node states
Raft
The Basics: Terms
Raft
The Basics: no emerging leader term
Raft
The Basics: Heartbeat RPC used to append
Raft
The Basics: Request Vote
Raft
Partition tolerance
Node B consider itself a lead (from old term) but it can reach
Consensus so it won't affect consistency
Raft
Follower Cases
A follower might have
missed some entries
as in a-b.
Or have extra
“uncommitted”
entries as in c-f (that
should be removed)
F was a leader of
term 2 that
committed some
changes then failed
Raft
Working conditions
broadcastTime ≪ electionTimeout ≪ MTBF
Where MTBF: is mean time between failures
Raft
Tuning for time without leader
Time without leader By tuning random timeout range
Raft
How Understandable?
Raft
Resources
● The Paper
● Visualization of Raft operations
● Etcd - implementation
● Consul – another implementation
● http://raftconsensus.github.io/
● CoreOS/Fleet uses Etcd
● Google Kubernetes - a distributed system
● Mesos / Mesosphere– a distributed system

More Related Content

What's hot

Multiversion Concurrency Control Techniques
Multiversion Concurrency Control TechniquesMultiversion Concurrency Control Techniques
Multiversion Concurrency Control Techniques
Raj vardhan
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
static libraries and dynamic libraries
static libraries and dynamic librariesstatic libraries and dynamic libraries
static libraries and dynamic libraries
Ronald Alexander Rivero
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
Sunita Sahu
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
Dr Sandeep Kumar Poonia
 
Google File System
Google File SystemGoogle File System
Google File System
Junyoung Jung
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency control
balamurugan.k Kalibalamurugan
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
confluent
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating System
Tech_MX
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
11. dfs
11. dfs11. dfs
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
Caching
CachingCaching
Caching
Nascenia IT
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Google Spanner
Google SpannerGoogle Spanner
Google Spanner
Vaidas Brundza
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
Mazin Alwaaly
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 

What's hot (20)

Multiversion Concurrency Control Techniques
Multiversion Concurrency Control TechniquesMultiversion Concurrency Control Techniques
Multiversion Concurrency Control Techniques
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
static libraries and dynamic libraries
static libraries and dynamic librariesstatic libraries and dynamic libraries
static libraries and dynamic libraries
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Google File System
Google File SystemGoogle File System
Google File System
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency control
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating System
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
11. dfs
11. dfs11. dfs
11. dfs
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
Caching
CachingCaching
Caching
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Google Spanner
Google SpannerGoogle Spanner
Google Spanner
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 

Viewers also liked

RAFT Consensus Algorithm
RAFT Consensus AlgorithmRAFT Consensus Algorithm
RAFT Consensus Algorithm
sangyun han
 
ملتقى الصناع هيا نصنع أردوينو وندخل إلى خفاياه
ملتقى الصناع  هيا نصنع أردوينو وندخل إلى خفاياهملتقى الصناع  هيا نصنع أردوينو وندخل إلى خفاياه
ملتقى الصناع هيا نصنع أردوينو وندخل إلى خفاياهmuayyad alsadi
 
الاختيار بين التقنيات
الاختيار بين التقنياتالاختيار بين التقنيات
الاختيار بين التقنيات
muayyad alsadi
 
How to think like hardware hacker
How to think like hardware hackerHow to think like hardware hacker
How to think like hardware hacker
muayyad alsadi
 
Algoritmos de Consenso: Paxos vs RAFT
Algoritmos de Consenso: Paxos vs RAFTAlgoritmos de Consenso: Paxos vs RAFT
Algoritmos de Consenso: Paxos vs RAFT
Maycon Viana Bordin
 
2 research design
2 research design2 research design
2 research design
usman nazir
 
Chicken buiscuit
Chicken buiscuitChicken buiscuit
Chicken buiscuit
usman nazir
 
7 Biggest Divorce Mistakes
7 Biggest Divorce Mistakes7 Biggest Divorce Mistakes
7 Biggest Divorce Mistakes
Joryn Jenkins
 
20140517 なんでも勉強会 にゃんたこす_rev02
20140517 なんでも勉強会 にゃんたこす_rev0220140517 なんでも勉強会 にゃんたこす_rev02
20140517 なんでも勉強会 にゃんたこす_rev02Toshiaki Yamanishi
 
88 gibraltar investment presentation
88 gibraltar investment presentation88 gibraltar investment presentation
88 gibraltar investment presentation
88gibraltar
 
New design knowlege - Manzini
New design knowlege -  ManziniNew design knowlege -  Manzini
New design knowlege - Manzini
Francisco Gómez Castro
 
Ovario poliquistico
Ovario poliquisticoOvario poliquistico
Ovario poliquistico
Jenny Elizabeth Molina Gavilan
 
Power system
Power systemPower system
Power system
ChandruSimbu
 
쉐어몬스터 소개
쉐어몬스터 소개쉐어몬스터 소개
쉐어몬스터 소개
윤근 강
 
88 Gibraltar Pricelist Tower 2
88 Gibraltar Pricelist Tower 288 Gibraltar Pricelist Tower 2
88 Gibraltar Pricelist Tower 2
88gibraltar
 

Viewers also liked (20)

RAFT Consensus Algorithm
RAFT Consensus AlgorithmRAFT Consensus Algorithm
RAFT Consensus Algorithm
 
ملتقى الصناع هيا نصنع أردوينو وندخل إلى خفاياه
ملتقى الصناع  هيا نصنع أردوينو وندخل إلى خفاياهملتقى الصناع  هيا نصنع أردوينو وندخل إلى خفاياه
ملتقى الصناع هيا نصنع أردوينو وندخل إلى خفاياه
 
الاختيار بين التقنيات
الاختيار بين التقنياتالاختيار بين التقنيات
الاختيار بين التقنيات
 
How to think like hardware hacker
How to think like hardware hackerHow to think like hardware hacker
How to think like hardware hacker
 
Algoritmos de Consenso: Paxos vs RAFT
Algoritmos de Consenso: Paxos vs RAFTAlgoritmos de Consenso: Paxos vs RAFT
Algoritmos de Consenso: Paxos vs RAFT
 
2 research design
2 research design2 research design
2 research design
 
Chicken buiscuit
Chicken buiscuitChicken buiscuit
Chicken buiscuit
 
7 Biggest Divorce Mistakes
7 Biggest Divorce Mistakes7 Biggest Divorce Mistakes
7 Biggest Divorce Mistakes
 
20140517 なんでも勉強会 にゃんたこす_rev02
20140517 なんでも勉強会 にゃんたこす_rev0220140517 なんでも勉強会 にゃんたこす_rev02
20140517 なんでも勉強会 にゃんたこす_rev02
 
88 gibraltar investment presentation
88 gibraltar investment presentation88 gibraltar investment presentation
88 gibraltar investment presentation
 
New design knowlege - Manzini
New design knowlege -  ManziniNew design knowlege -  Manzini
New design knowlege - Manzini
 
Wordle
WordleWordle
Wordle
 
Ovario poliquistico
Ovario poliquisticoOvario poliquistico
Ovario poliquistico
 
智慧城市
智慧城市智慧城市
智慧城市
 
Power system
Power systemPower system
Power system
 
쉐어몬스터 소개
쉐어몬스터 소개쉐어몬스터 소개
쉐어몬스터 소개
 
Cert Degree
Cert DegreeCert Degree
Cert Degree
 
av başladı
av başladıav başladı
av başladı
 
88 Gibraltar Pricelist Tower 2
88 Gibraltar Pricelist Tower 288 Gibraltar Pricelist Tower 2
88 Gibraltar Pricelist Tower 2
 
Sigil
SigilSigil
Sigil
 

Similar to Introduction to Raft algorithm

Raft_Diego_Ongaro.pdf
Raft_Diego_Ongaro.pdfRaft_Diego_Ongaro.pdf
Raft_Diego_Ongaro.pdf
jsathy97
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
Ankita Kapratwar
 
At8000 s configurando trunking
At8000 s configurando trunkingAt8000 s configurando trunking
At8000 s configurando trunking
NetPlus
 
2009-03-13 Atlanda System z Council Meeting
2009-03-13 Atlanda System z Council Meeting2009-03-13 Atlanda System z Council Meeting
2009-03-13 Atlanda System z Council Meeting
Shawn Wells
 
Exp2
Exp2Exp2
An update from the RabbitMQ team - Michael Klishin
An update from the RabbitMQ team - Michael KlishinAn update from the RabbitMQ team - Michael Klishin
An update from the RabbitMQ team - Michael Klishin
RabbitMQ Summit
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
Kuberton
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking Revisited
Rob Vesse
 
Shopzilla On Concurrency
Shopzilla On ConcurrencyShopzilla On Concurrency
Shopzilla On Concurrency
Will Gage
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
MariaDB plc
 
Squash Those IoT Security Bugs with a Hardened System Profile
Squash Those IoT Security Bugs with a Hardened System ProfileSquash Those IoT Security Bugs with a Hardened System Profile
Squash Those IoT Security Bugs with a Hardened System Profile
Steve Arnold
 
Presto
PrestoPresto
Presto
Knoldus Inc.
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
Vitaly Peregudov
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
jlacefie
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
MariaDB plc
 
Fine-tuning Group Replication for Performance
Fine-tuning Group Replication for PerformanceFine-tuning Group Replication for Performance
Fine-tuning Group Replication for Performance
Vitor Oliveira
 
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
Ensar Basri Kahveci
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein
 
Rfc1058
Rfc1058Rfc1058

Similar to Introduction to Raft algorithm (20)

Raft_Diego_Ongaro.pdf
Raft_Diego_Ongaro.pdfRaft_Diego_Ongaro.pdf
Raft_Diego_Ongaro.pdf
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
At8000 s configurando trunking
At8000 s configurando trunkingAt8000 s configurando trunking
At8000 s configurando trunking
 
2009-03-13 Atlanda System z Council Meeting
2009-03-13 Atlanda System z Council Meeting2009-03-13 Atlanda System z Council Meeting
2009-03-13 Atlanda System z Council Meeting
 
Exp2
Exp2Exp2
Exp2
 
An update from the RabbitMQ team - Michael Klishin
An update from the RabbitMQ team - Michael KlishinAn update from the RabbitMQ team - Michael Klishin
An update from the RabbitMQ team - Michael Klishin
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking Revisited
 
Shopzilla On Concurrency
Shopzilla On ConcurrencyShopzilla On Concurrency
Shopzilla On Concurrency
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
 
Squash Those IoT Security Bugs with a Hardened System Profile
Squash Those IoT Security Bugs with a Hardened System ProfileSquash Those IoT Security Bugs with a Hardened System Profile
Squash Those IoT Security Bugs with a Hardened System Profile
 
Presto
PrestoPresto
Presto
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
 
Fine-tuning Group Replication for Performance
Fine-tuning Group Replication for PerformanceFine-tuning Group Replication for Performance
Fine-tuning Group Replication for Performance
 
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Rfc1058
Rfc1058Rfc1058
Rfc1058
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 

Introduction to Raft algorithm

  • 1. Raft Understandable Consensus Algorithm Presentation for Seminar Muayyad Saleh Alsadi (20158043) Based on Raft Paper “In Search of an Understandable Consensus Algorithm” Diego Ongaro and John Ousterhout, Stanford University
  • 2. Raft Introduction Introduction Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to multi-Paxos, and it is as efficient as Paxos but with completely different structure meant to be understandable as the most key objective.
  • 3. Raft Introduction to consensus algorithms What is a consensus algorithm? “Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failures of some of its members. Because of this, they play a key role in building reliable large-scale software systems.”
  • 4. Raft Introduction to consensus algorithms Replicated State Machine a collection of servers compute identical copies of the same state and can continue operating even if some of the servers are down. Replicated state machines are used to solve a variety of fault tolerance problems in distributed systems.
  • 5. Raft Introduction to consensus algorithms Why not Paxos? “There are significant gaps between the description of the Paxos algorithm and the needs of a real-world system… the final system will be based on an unproven protocol” Quoted from Google's Chubby implementers
  • 6. Raft Introduction to consensus algorithms Why not Paxos? #1 Single-decree Paxos is dense and subtle: it is divided into two stages that do not have simple intuitive explanations and cannot be understood independently. Because of this, it is difficult to develop intuitions about why the single-decree protocol works. The composition rules for multi-Paxos add significant additional complexity and subtlety.
  • 7. Raft Introduction to consensus algorithms Why not Paxos? #2 it does not provide a good foundation for building practical implementations. One reason is that there is no widely agreed-upon algorithm for multi-Paxos. Lamport’s descriptions are mostly about single-decree Paxos; he sketched possible approaches to multi-Paxos, but many details are missing.
  • 8. Raft Properties and key features Novel features of Raft: ● Strong leader: Log entries only flow from the leader to other servers. This simplifies the management of the replicated log and makes Raft easier to understand. ● Leader election: beside heartbeats required in any consensus algorithm, Raft uses randomized timers to elect leaders. This helps resolving conflicts simply and rapidly. ● Membership changes: changing the set of servers in the cluster uses a new joint consensus approach where the majorities of two different configurations overlap during transitions. This allows the cluster to continue operating normally during configuration changes.
  • 9. Raft Properties and key features consensus algorithm properties ● Safety: never returning an incorrect result in any condition despite network delays, partitions, and packet loss, duplication, and reorder ● Fault-tolerant: the system would be available and fully- functional in case of failure of some nodes. ● Does not depend on time consistency: ● Performance is not affected by minority of slow nodes
  • 10. Raft Sub-problems ● Leader election: system should continue to work even if its leader fails, by electing a new leader ● Log replication: leader takes operations from clients and replicate them into all nodes. Forcing them all to agree. ● Safety: assert that the 5 guarantees (next slide) are all satisfied at any time
  • 11. Raft The 5 guarantees 1)Election Safety: at most one leader can be elected in a given term. 2)Leader Append-Only: a leader never overwrites or deletes entries in its log; it only appends new entries. 3)Log Matching: if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index. 4)Leader Completeness: if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms. 5)State Machine Safety: if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index.
  • 12. Raft The Basics: The 3 node states
  • 14. Raft The Basics: no emerging leader term
  • 15. Raft The Basics: Heartbeat RPC used to append
  • 17. Raft Partition tolerance Node B consider itself a lead (from old term) but it can reach Consensus so it won't affect consistency
  • 18. Raft Follower Cases A follower might have missed some entries as in a-b. Or have extra “uncommitted” entries as in c-f (that should be removed) F was a leader of term 2 that committed some changes then failed
  • 19. Raft Working conditions broadcastTime ≪ electionTimeout ≪ MTBF Where MTBF: is mean time between failures
  • 20. Raft Tuning for time without leader Time without leader By tuning random timeout range
  • 22. Raft Resources ● The Paper ● Visualization of Raft operations ● Etcd - implementation ● Consul – another implementation ● http://raftconsensus.github.io/ ● CoreOS/Fleet uses Etcd ● Google Kubernetes - a distributed system ● Mesos / Mesosphere– a distributed system