SlideShare a Scribd company logo
1 of 86
Download to read offline
CONSENSUS ALGORITHMS
YIFAN XING - 2018
DISTRIBUTED
SYSTEMS
Yifan Xing
Consensus Algorithms
@yifan_xing_e
Poll
Kafka:
A distributed
streaming platform
YIFAN XING - 2018 @yifan_xing_e
Consensus Algorithm
e.g. Raft, Paxos
1
CONSENSUS ALGORITHMS
Distributed System
YIFAN XING - 2018
What is a Distributed System?
          - Components
          - Located on different networked computers
          - Communicate and coordinate
          - Passing messages to each other
@yifan_xing_e
2
CONSENSUS ALGORITHMS
Modern Examples
YIFAN XING - 2018
Web DNS Bittorrent
@yifan_xing_e
3
CONSENSUS ALGORITHMS
Fun Fact
YIFAN XING - 2018
American Airline
Central Office
Travel Agents
1920s
Cards for each flight
Mark seats sold on cards
@yifan_xing_e
4
CONSENSUS ALGORITHMS
Significant challenges/ characteristics of distributed systems are:
Lack of global Knowledge
Time
Concurrency/ Security
Failures
Challenges
5
YIFAN XING - 2018 @yifan_xing_e
Challenge 1: Lack of Global Knowledge
YIFAN XING - 2018
No host has global knowledge
How to exchange StateMachines' state information?
Information up-to-date?
How to detect inconsistency?
@yifan_xing_e
6
CONSENSUS ALGORITHMS
Challenge 2: Time
YIFAN XING - 2018
Clock skew
Delay/duplicate messages
How to determine what happened first?
@yifan_xing_e
7
CONSENSUS ALGORITHMS
Challenge 3: Consistency/ Security
YIFAN XING - 2018
Often will have concurrent operations on a single object
How to ensure object is in consistent state? 
Conflicts
@yifan_xing_e
8
CONSENSUS ALGORITHMS
Challenge 4: Failures
YIFAN XING - 2018
Can affect each other
How to tolerate failures of components?
Tolerate: detect, handle, recover
@yifan_xing_e
9
A distributed system is one in which the failure of
a computer you didn't even know existed can
render your own computer unusable.
— Leslie Lamport
CONSENSUS ALGORITHMS
Achieve consensus between machines
Fault-tolerance => integrity, availability, etc.
Reliably process and execute client commands
Consensus Algorithms
10
YIFAN XING - 2018 @yifan_xing_e
Raft Paxos
YIFAN XING - 2018 @yifan_xing_e
Paxos
CONSENSUS ALGORITHMS
YIFAN XING - 2018 @yifan_xing_e
11
Paxos Background
Paxos Made Simple: Basic Paxos
Lynch & Liskov
Leslie Lamport Proved
The Part Time Parliament
Multi-paxos: Paxos + Complexity
Rejected
Published
No mathematical proof
10
CONSENSUS ALGORITHMS
YIFAN XING - 2018 @yifan_xing_e
12
Paxos Made Simple
CONSENSUS ALGORITHMS
Basic Paxos
YIFAN XING - 2018
Proposer: Propose a value
Prepare: Try to propose
Acceptor: accept/ reject value
@yifan_xing_e
13
Consensus: agree on one value
CONSENSUS ALGORITHMS
Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposer
Acceptor
Choose a
Proposal
Number (n)
CONSENSUS ALGORITHMS
Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Choose a
Proposal
Number (n)
Broadcast
the number
to all servers
Prepare(n)
CONSENSUS ALGORITHMS
Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Choose a
Proposal
Number (n)
Broadcast
the number
to all servers
Prepare(n)
If n > maxProposal:
maxProposal = n
promise
CONSENSUS ALGORITHMS
Respond Promise
Won't accept proposal with
n' < n
Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Choose a
Proposal
Number (n)
Broadcast
the number
to all servers
Prepare(n)
Respond Promise
Won't accept proposal with
n' < n
If majority
Accept(n, )
CONSENSUS ALGORITHMS
Basic Paxos: Leader Election
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Broadcast
Accept(n, value)
CONSENSUS ALGORITHMS
Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Broadcast
Accept(n, value)
Respond
If n >= maxProposal:
acceptedProposal = maxProposal = n
acceptedValue = value
CONSENSUS ALGORITHMS
Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Broadcast
Accept(n, value)
Respond
If n >= maxProposal:
acceptedProposal = maxProposal = n
acceptedValue = value
If majority:
if any rejection => n is not largest
repeat from beginning
else:
value chosen
CONSENSUS ALGORITHMS
Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Broadcast
Accept(n, value)
Respond
If n >= maxProposal:
acceptedProposal = maxProposal = n
acceptedValue = value
If majority:
if any rejection => n is not largest
repeat from beginning
else:
value chosen
CONSENSUS ALGORITHMS
Paxos: Proposal Number
YIFAN XING - 2018 @yifan_xing_e
15
S0
- server id: unique
- round number:
increment overtime, shared among all servers
servers keep track of highest round number from msgs
Generate new proposal number:
increment maxRound
concatenate with server id
CONSENSUS ALGORITHMS
Multi-Paxos
YIFAN XING - 2018 @yifan_xing_e
16
CONSENSUS ALGORITHMS
A sequence of instances of Basic Paxos
Each instances of Basic Paxos is a log entry
Implementation of Basic Paxos Concepts
Multi-Paxos
YIFAN XING - 2018 @yifan_xing_e
17
CONSENSUS ALGORITHMS
Client
cmd
Multi-Paxos
YIFAN XING - 2018 @yifan_xing_e
17
CONSENSUS ALGORITHMS
Client
cmd
Basic Paxos
Choose cmds (values) in log entries
Multi-Paxos
YIFAN XING - 2018 @yifan_xing_e
17
CONSENSUS ALGORITHMS
Apply cmd (log entries)
1
cmd
1
cmd
1
cmd
2
cmd
2
cmd
3
cmd
3
cmd
4
cmd
Multi-Paxos
YIFAN XING - 2018 @yifan_xing_e
17
CONSENSUS ALGORITHMS
Client
result
Paxos: Leader Vs. NonLeader
YIFAN XING - 2018 @yifan_xing_e
18
S0
If didn't receive Heartbeat from a Higher ID for >= 2T ms:
act as leader
act as proposer
Server with highest ID
Heartbeat for every T ms
Accept requests from client
Leader/ Distinguished Proposer:
CONSENSUS ALGORITHMS
Paxos: Leader Vs. NonLeader
YIFAN XING - 2018 @yifan_xing_e
18
S0
Server with highest ID
Heartbeat for every T ms
Accept requests from client
Non-leader:
Redirect client requests to leader
act as acceptor
Leader/ Proposer:
CONSENSUS ALGORITHMS
Paxos: Leader
YIFAN XING - 2018 @yifan_xing_e
19
Unlikely to have two leaders at the same time
Can handle multiple leaders, however, it won’t work as
efficient because of conflicts
CONSENSUS ALGORITHMS
Industry Examples
YIFAN XING - 2018 @yifan_xing_e
20
CONSENSUS ALGORITHMS
Raft
YIFAN XING - 2018 @yifan_xing_e
CONSENSUS ALGORITHMS
Raft: Background
YIFAN XING - 2018
Simpler version of Paxos
Equivalent: performance & fault-tolerance
Consistency, Conciseness, Correctness
Why understandability: implemented -> useful, extended/ adapted
to the environment, esp. In DS
Understandability
@yifan_xing_e
21
Designed by Diego Ongaro and John Ousterhout at Stanford
CONSENSUS ALGORITHMS
Raft
YIFAN XING - 2018 @yifan_xing_e
22
Raft: log (!value)
CONSENSUS ALGORITHMS
Raft: Phases
YIFAN XING - 2018
1. Select one machine to be a leader
2. Detect crashes, reelection
@yifan_xing_e
23
Leader Election
CONSENSUS ALGORITHMS
Raft: Phases
YIFAN XING - 2018
1. Select one machine to be a leader
2. Detect crashes, reelection
1. Leader processes commands from clients
2. Replicates logs (consistency and consensus among servers)
@yifan_xing_e
23
Leader Election
Log Replication
CONSENSUS ALGORITHMS
Raft: Phases
YIFAN XING - 2018
1. Select one machine to be a leader
2. Detect crashes, reelection
1. Leader processes commands from clients
2. Replicates logs (consistency and consensus among servers)
Keeps log consistent
Only servers with up-to-date logs can be leader
@yifan_xing_e
Leader Election
Log Replication
Safety
CONSENSUS ALGORITHMS
23
Raft: Leader Election
YIFAN XING - 2018 @yifan_xing_e
24
Become Candidate
CurrentTerm++
Vote for itself
Send
RequestVoteRPC
to other servers
Become Leader:
- Send heartbeats
- Handle requests
Become Follower:
- Redirect requests
M
ajority
Votes
Timeout
RPC from
Leader
CONSENSUS ALGORITHMS
Raft: How to ensure election works?
YIFAN XING - 2018 @yifan_xing_e
25
At most one leader per term:
 - Each server: one vote per term
 - Receive majority to win election (N / 2 + 1)
 - Example:
S0 S1
S1S0 S0 S0 S1
S2 S3 S4
CONSENSUS ALGORITHMS
Raft: How to ensure election works?
YIFAN XING - 2018 @yifan_xing_e
26
At most one leader per term:
 - Each server: one vote per term
 - Receive majority to win election (N / 2 + 1)
 - Example:
S0
Leader
S1 S2 S3 S4
CONSENSUS ALGORITHMS
S1S0 S0 S0 S1
Raft: How to ensure election works?
YIFAN XING - 2018 @yifan_xing_e
27
There will eventually be a leader:
- Random election timeout (range 100-300ms)
- Usually, one times out first, and win the majority votes
- If two time out at the same time:
- Split vote -> election timeout -> re-enter election state (increment
term, gather votes)
CONSENSUS ALGORITHMS
Raft: Log Replication
YIFAN XING - 2018 @yifan_xing_e
28
Client
S0
Log
CONSENSUS ALGORITHMS
Raft: Leader Appends Entry to Log
YIFAN XING - 2018 @yifan_xing_e
28
Client
Log cmd
S0
CONSENSUS ALGORITHMS
Raft: Leader Sends AppendEntryRPC
YIFAN XING - 2018 @yifan_xing_e
28
Client
Log cmd
S0
AppendEntryRPC
CONSENSUS ALGORITHMS
Raft: Followers Send ACKs
YIFAN XING - 2018 @yifan_xing_e
28
Client
Log cmd
cmd
cmd
cmd
cmd
S0
ACK
CONSENSUS ALGORITHMS
Raft: Replies Client
YIFAN XING - 2018 @yifan_xing_e
28
Client
Log cmd
cmd
cmd
cmd
S0
If majority:
 - Entry Committed
 - Execute
 - Return result
cmd
CONSENSUS ALGORITHMS
Raft: Notifies Followers of Committed Entry
YIFAN XING - 2018 @yifan_xing_e
28
Client
Log cmd
cmd
cmd
cmd
cmd
S0
CONSENSUS ALGORITHMS
Raft: Not Majority?
YIFAN XING - 2018 @yifan_xing_e
28
Client
Log cmd
cmd
S0
ACK
If not majority:
 - Leader retries until succeed
CONSENSUS ALGORITHMS
Raft: Log Entry
YIFAN XING - 2018 @yifan_xing_e
29
Term:
 - Current Term when
receive cmd
Command:
 - cmd to execute
1
cmd
S0
Index
CONSENSUS ALGORITHMS
Raft: Consistency
Server crushes => log inconsistency
Goal: log consistency. But how?
YIFAN XING - 2018 @yifan_xing_e
30
1
cmd
1
cmd
1
cmd
2
cmd
2
cmd
3
cmd
3
cmd
4
cmd
4
cmd
4
cmd
4
cmd
S0
1
cmd
1
cmd
1
cmd
2
cmd
2
cmd
3
cmd
3
cmd
4
cmd
4
cmd
4
cmd
4
cmd
S2
1
cmd
1
cmd
1
cmd
2
cmd
2
cmd
S1
CONSENSUS ALGORITHMS
Log Consistency
Always trust leader’s log
No “holes” in log
Repair inconsistency during log replication process
If a given entry is committed, then all preceding entries are also
committed
YIFAN XING - 2018 @yifan_xing_e
31
CONSENSUS ALGORITHMS
Log Matching Property
If both index and term match:
    1. the two entries store the same cmd
    2. all previous entries are identical
For each log entry:
    compare index and term
YIFAN XING - 2018 @yifan_xing_e
32
1
cmd
1
cmd
2
cmd
2
cmd
2
cmd
1
cmd
1
cmd
2
cmd
2
cmd
2
cmd
S0
S1
1
cmd
1
cmd
2
cmd
3
xxx
3
xxxS2
CONSENSUS ALGORITHMS
1 2 3 4 5
Log Replication: Ensure Consistency
Check for consistency when sending AppendEntriesRPC
YIFAN XING - 2018 @yifan_xing_e
33
AppendEntryRPC:
CurrentEntry: index, term, cmd
PrecedingEntry: index, term
1
cmd
1
cmd
2
cmd
1 2 3 4 5
2
cmd
2
cmd
S0
CONSENSUS ALGORITHMS
Goal: append a new entry to log
YIFAN XING - 2018 @yifan_xing_e
34
1
cmd
1
cmd
2
cmd
2
cmd
2
cmdS0
1
cmd
1
cmd
2
cmd
2
cmdS1
AppendEntryRPC:
CurrentEntry: index, term, cmd
PrecedingEntry: index, term
1 2 3 4 5
Receiver checks its own preceding
index and term
CONSENSUS ALGORITHMS
Log Replication: Ensure Consistency
Goal: append a new entry to log
YIFAN XING - 2018 @yifan_xing_e
34
1
cmd
1
cmd
2
cmd
2
cmd
2
cmdS0
1
cmd
1
cmd
2
cmd
2
cmd
2
cmdS1
If match => append entry
else => rejects request
1 2 3 4 5
CONSENSUS ALGORITHMS
Log Replication: Ensure Consistency
Goal: append a new entry to log
YIFAN XING - 2018 @yifan_xing_e
34
1
cmd
1
cmd
1
cmd
2
cmd
2
cmdS0
1
cmd
1
cmd
1
cmd
1
cmdS1
1 2 3 4 5
AppendEntryRPC:
CurrentEntry: index, term, cmd
PrecedingEntry: index, term
CONSENSUS ALGORITHMS
Log Replication: Ensure Consistency
Goal: append a new entry to log
YIFAN XING - 2018 @yifan_xing_e
34
1
cmd
1
cmd
1
cmd
2
cmd
2
cmdS0
1
cmd
1
cmd
1
cmd
1
cmdS1
1 2 3 4 5
Do not match: REJECT
CONSENSUS ALGORITHMS
Log Replication: Ensure Consistency
Goal: append a new entry to log
YIFAN XING - 2018 @yifan_xing_e
34
1
cmd
1
cmd
1
cmd
2
cmd
2
cmdS0
1
cmd
1
cmd
1
cmd
1
cmdS1
1 2 3 4 5
AppendEntryRPC:
CurrentEntry: index, term, cmd
PrecedingEntry: index, term
CONSENSUS ALGORITHMS
Log Replication: Ensure Consistency
Goal: append a new entry to log
YIFAN XING - 2018 @yifan_xing_e
34
1
cmd
1
cmd
1
cmd
2
cmd
2
cmdS0
1
cmd
1
cmd
1
cmdS1
2
cmd
2
cmd
1 2 3 4 5
Match: Replicate Log Entry
CONSENSUS ALGORITHMS
Log Replication: Ensure Consistency
Leader's Log Completeness
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
35
If an entry is committed, all leaders (current + future) must
store the entry
CONSENSUS ALGORITHMS
Leader's Log Completeness
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
35
Check candidate (potential leader)’s log completeness
Servers with incomplete logs => will not be elected
CONSENSUS ALGORITHMS
Leader's Log Completeness
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
35
RequestVoteRPC:
term - candidate’s term
candidateId - candidate requesting vote
lastLogIndex - index of candidate’s last log entry
lastLogTerm - term of candidate’s last log entry
CONSENSUS ALGORITHMS
(In)complete Log Example - 1
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 2
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 1
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 3
lastLogTerm - 1
Current
StateMachine
CONSENSUS ALGORITHMS
(In)complete Log Example - 1
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 2
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 1
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 3
lastLogTerm - 1
Current
StateMachine
REJECT VOTE
Candidate.logTerm < my.logTerm => my log is more complete
CONSENSUS ALGORITHMS
(In)complete Log Example - 2
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
CONSENSUS ALGORITHMS
(In)complete Log Example - 2
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
REJECT VOTE
Candidate.lastLogIndex < my.lastLogIndex => my log is longer
CONSENSUS ALGORITHMS
(In)complete Log Example - 3
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 5
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
CONSENSUS ALGORITHMS
(In)complete Log Example - 3
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 5
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
ACCEPT VOTE
REQUEST
candidate.term >= my.term && candidate.lastLogIdx < my.lastLogIdx => complete
CONSENSUS ALGORITHMS
Problems
YIFAN XING - 2018 @yifan_xing_e
CONSENSUS ALGORITHMS
Messages
YIFAN XING - 2018 @yifan_xing_e
37
Rely on messages
High overhead
Messages:
Duplicate
Out of order
Lost
Network Latency
CONSENSUS ALGORITHMS
Failures
YIFAN XING - 2018 @yifan_xing_e
38
Failures:
Availability
Leader failure
Follower(s) failure
Partition
CONSENSUS ALGORITHMS
Security
YIFAN XING - 2018 @yifan_xing_e
39
Confidentiality (who can read/ write?)
CONSENSUS ALGORITHMS
Test
YIFAN XING - 2018 @yifan_xing_e
40
Reliable
Test and debugging is hard
CONSENSUS ALGORITHMS
Server problems: e.g. Crushloop
Application-level problem: e.g. Corrupted Data
Other
YIFAN XING - 2018 @yifan_xing_e
41
CONSENSUS ALGORITHMS
Issues
YIFAN XING - 2018 @yifan_xing_e
42
Partitions
Network delays
Packet loss
Duplication
Ordering
CONSENSUS ALGORITHMS
YIFAN XING - 2018 @yifan_xing_e
43
Byzantine Generals Problem
CONSENSUS ALGORITHMS
YIFAN XING - 2018 @yifan_xing_e
44
Byzantine Failure
CONSENSUS ALGORITHMS
YIFAN XING - 2018 @yifan_xing_e
44
Byzantine Failure
CONSENSUS ALGORITHMS
YIFAN XING - 2018 @yifan_xing_e
44
Byzantine Failure
CONSENSUS ALGORITHMS
Malicious & Misbehaving Peers
YIFAN XING - 2018 @yifan_xing_e
45
Not resilient against malicious or misbehaving peers.
May not be capable of destroying the whole consensus protocol
Abused:
Confidentiality? (only intended parties can read)
Integrity? (messages are authentic)
Correctness is guaranteed ONLY if each peer adheres to the
protocol requirements.
Corrupting the application-level values
Injecting malicious values
CONSENSUS ALGORITHMS
YIFAN XING - 2018 @yifan_xing_e
46
Byzantine Fault Tolerant Algorithms
CONSENSUS ALGORITHMS
Byzantine Paxos algorithms
Byzantine Raft algorithms
YIFAN XING - 2018 @yifan_xing_e
46
Byzantine Fault Tolerant Algorithms
CONSENSUS ALGORITHMS
Byzantine Paxos algorithms
Byzantine Raft algorithms
Implementing Distributed Systems Protocols/ Consensus Algorithms:
Understandability
Consistency
Correctness, etc.
Don't
Resilient against issues?
Worth it to be resilient?
Take Away
YIFAN XING - 2018 @yifan_xing_e
47
CONSENSUS ALGORITHMS
Reliability Complexity
Designing Distributed Systems Protocols/ Consensus Algorithms:
T H A N K Y O U
@yifan_xing_e

More Related Content

Similar to Consensus Algorithms

That Conference 2017: Refactoring your Monitoring
That Conference 2017: Refactoring your MonitoringThat Conference 2017: Refactoring your Monitoring
That Conference 2017: Refactoring your MonitoringJamie Riedesel
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 
Watchguard presentation tech day el salvador
Watchguard presentation tech day el salvadorWatchguard presentation tech day el salvador
Watchguard presentation tech day el salvadorJose Molina
 
Using Scylla for Order Capture at Fanatics
Using Scylla for Order Capture at FanaticsUsing Scylla for Order Capture at Fanatics
Using Scylla for Order Capture at FanaticsScyllaDB
 
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxData
 
Hacking The Trading Floor
Hacking The Trading FloorHacking The Trading Floor
Hacking The Trading Flooriffybird_099
 
APIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.io
APIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.ioAPIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.io
APIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.ioapidays
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward
 

Similar to Consensus Algorithms (8)

That Conference 2017: Refactoring your Monitoring
That Conference 2017: Refactoring your MonitoringThat Conference 2017: Refactoring your Monitoring
That Conference 2017: Refactoring your Monitoring
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Watchguard presentation tech day el salvador
Watchguard presentation tech day el salvadorWatchguard presentation tech day el salvador
Watchguard presentation tech day el salvador
 
Using Scylla for Order Capture at Fanatics
Using Scylla for Order Capture at FanaticsUsing Scylla for Order Capture at Fanatics
Using Scylla for Order Capture at Fanatics
 
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
 
Hacking The Trading Floor
Hacking The Trading FloorHacking The Trading Floor
Hacking The Trading Floor
 
APIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.io
APIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.ioAPIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.io
APIdays Paris 2018 - Event-Driven APIs Eric Horesnyi, CEO, Streamdata.io
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Consensus Algorithms

  • 1. CONSENSUS ALGORITHMS YIFAN XING - 2018 DISTRIBUTED SYSTEMS Yifan Xing Consensus Algorithms @yifan_xing_e
  • 2. Poll Kafka: A distributed streaming platform YIFAN XING - 2018 @yifan_xing_e Consensus Algorithm e.g. Raft, Paxos 1 CONSENSUS ALGORITHMS
  • 3. Distributed System YIFAN XING - 2018 What is a Distributed System?           - Components           - Located on different networked computers           - Communicate and coordinate           - Passing messages to each other @yifan_xing_e 2 CONSENSUS ALGORITHMS
  • 4. Modern Examples YIFAN XING - 2018 Web DNS Bittorrent @yifan_xing_e 3 CONSENSUS ALGORITHMS
  • 5. Fun Fact YIFAN XING - 2018 American Airline Central Office Travel Agents 1920s Cards for each flight Mark seats sold on cards @yifan_xing_e 4 CONSENSUS ALGORITHMS
  • 6. Significant challenges/ characteristics of distributed systems are: Lack of global Knowledge Time Concurrency/ Security Failures Challenges 5 YIFAN XING - 2018 @yifan_xing_e
  • 7. Challenge 1: Lack of Global Knowledge YIFAN XING - 2018 No host has global knowledge How to exchange StateMachines' state information? Information up-to-date? How to detect inconsistency? @yifan_xing_e 6 CONSENSUS ALGORITHMS
  • 8. Challenge 2: Time YIFAN XING - 2018 Clock skew Delay/duplicate messages How to determine what happened first? @yifan_xing_e 7 CONSENSUS ALGORITHMS
  • 9. Challenge 3: Consistency/ Security YIFAN XING - 2018 Often will have concurrent operations on a single object How to ensure object is in consistent state?  Conflicts @yifan_xing_e 8 CONSENSUS ALGORITHMS
  • 10. Challenge 4: Failures YIFAN XING - 2018 Can affect each other How to tolerate failures of components? Tolerate: detect, handle, recover @yifan_xing_e 9 A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. — Leslie Lamport CONSENSUS ALGORITHMS
  • 11. Achieve consensus between machines Fault-tolerance => integrity, availability, etc. Reliably process and execute client commands Consensus Algorithms 10 YIFAN XING - 2018 @yifan_xing_e Raft Paxos
  • 12. YIFAN XING - 2018 @yifan_xing_e Paxos CONSENSUS ALGORITHMS
  • 13. YIFAN XING - 2018 @yifan_xing_e 11 Paxos Background Paxos Made Simple: Basic Paxos Lynch & Liskov Leslie Lamport Proved The Part Time Parliament Multi-paxos: Paxos + Complexity Rejected Published No mathematical proof 10 CONSENSUS ALGORITHMS
  • 14. YIFAN XING - 2018 @yifan_xing_e 12 Paxos Made Simple CONSENSUS ALGORITHMS
  • 15. Basic Paxos YIFAN XING - 2018 Proposer: Propose a value Prepare: Try to propose Acceptor: accept/ reject value @yifan_xing_e 13 Consensus: agree on one value CONSENSUS ALGORITHMS
  • 16. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Choose a Proposal Number (n) CONSENSUS ALGORITHMS
  • 17. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposor Acceptor Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) CONSENSUS ALGORITHMS
  • 18. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposor Acceptor Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) If n > maxProposal: maxProposal = n promise CONSENSUS ALGORITHMS Respond Promise Won't accept proposal with n' < n
  • 19. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposor Acceptor Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) Respond Promise Won't accept proposal with n' < n If majority Accept(n, ) CONSENSUS ALGORITHMS
  • 20. Basic Paxos: Leader Election YIFAN XING - 2018 @yifan_xing_e 14 Proposor Acceptor Broadcast Accept(n, value) CONSENSUS ALGORITHMS
  • 21. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposor Acceptor Broadcast Accept(n, value) Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value CONSENSUS ALGORITHMS
  • 22. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposor Acceptor Broadcast Accept(n, value) Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS
  • 23. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposor Acceptor Broadcast Accept(n, value) Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS
  • 24. Paxos: Proposal Number YIFAN XING - 2018 @yifan_xing_e 15 S0 - server id: unique - round number: increment overtime, shared among all servers servers keep track of highest round number from msgs Generate new proposal number: increment maxRound concatenate with server id CONSENSUS ALGORITHMS
  • 25. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 16 CONSENSUS ALGORITHMS A sequence of instances of Basic Paxos Each instances of Basic Paxos is a log entry Implementation of Basic Paxos Concepts
  • 26. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client cmd
  • 27. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client cmd Basic Paxos Choose cmds (values) in log entries
  • 28. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Apply cmd (log entries) 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd
  • 29. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client result
  • 30. Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18 S0 If didn't receive Heartbeat from a Higher ID for >= 2T ms: act as leader act as proposer Server with highest ID Heartbeat for every T ms Accept requests from client Leader/ Distinguished Proposer: CONSENSUS ALGORITHMS
  • 31. Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18 S0 Server with highest ID Heartbeat for every T ms Accept requests from client Non-leader: Redirect client requests to leader act as acceptor Leader/ Proposer: CONSENSUS ALGORITHMS
  • 32. Paxos: Leader YIFAN XING - 2018 @yifan_xing_e 19 Unlikely to have two leaders at the same time Can handle multiple leaders, however, it won’t work as efficient because of conflicts CONSENSUS ALGORITHMS
  • 33. Industry Examples YIFAN XING - 2018 @yifan_xing_e 20 CONSENSUS ALGORITHMS
  • 34. Raft YIFAN XING - 2018 @yifan_xing_e CONSENSUS ALGORITHMS
  • 35. Raft: Background YIFAN XING - 2018 Simpler version of Paxos Equivalent: performance & fault-tolerance Consistency, Conciseness, Correctness Why understandability: implemented -> useful, extended/ adapted to the environment, esp. In DS Understandability @yifan_xing_e 21 Designed by Diego Ongaro and John Ousterhout at Stanford CONSENSUS ALGORITHMS
  • 36. Raft YIFAN XING - 2018 @yifan_xing_e 22 Raft: log (!value) CONSENSUS ALGORITHMS
  • 37. Raft: Phases YIFAN XING - 2018 1. Select one machine to be a leader 2. Detect crashes, reelection @yifan_xing_e 23 Leader Election CONSENSUS ALGORITHMS
  • 38. Raft: Phases YIFAN XING - 2018 1. Select one machine to be a leader 2. Detect crashes, reelection 1. Leader processes commands from clients 2. Replicates logs (consistency and consensus among servers) @yifan_xing_e 23 Leader Election Log Replication CONSENSUS ALGORITHMS
  • 39. Raft: Phases YIFAN XING - 2018 1. Select one machine to be a leader 2. Detect crashes, reelection 1. Leader processes commands from clients 2. Replicates logs (consistency and consensus among servers) Keeps log consistent Only servers with up-to-date logs can be leader @yifan_xing_e Leader Election Log Replication Safety CONSENSUS ALGORITHMS 23
  • 40. Raft: Leader Election YIFAN XING - 2018 @yifan_xing_e 24 Become Candidate CurrentTerm++ Vote for itself Send RequestVoteRPC to other servers Become Leader: - Send heartbeats - Handle requests Become Follower: - Redirect requests M ajority Votes Timeout RPC from Leader CONSENSUS ALGORITHMS
  • 41. Raft: How to ensure election works? YIFAN XING - 2018 @yifan_xing_e 25 At most one leader per term:  - Each server: one vote per term  - Receive majority to win election (N / 2 + 1)  - Example: S0 S1 S1S0 S0 S0 S1 S2 S3 S4 CONSENSUS ALGORITHMS
  • 42. Raft: How to ensure election works? YIFAN XING - 2018 @yifan_xing_e 26 At most one leader per term:  - Each server: one vote per term  - Receive majority to win election (N / 2 + 1)  - Example: S0 Leader S1 S2 S3 S4 CONSENSUS ALGORITHMS S1S0 S0 S0 S1
  • 43. Raft: How to ensure election works? YIFAN XING - 2018 @yifan_xing_e 27 There will eventually be a leader: - Random election timeout (range 100-300ms) - Usually, one times out first, and win the majority votes - If two time out at the same time: - Split vote -> election timeout -> re-enter election state (increment term, gather votes) CONSENSUS ALGORITHMS
  • 44. Raft: Log Replication YIFAN XING - 2018 @yifan_xing_e 28 Client S0 Log CONSENSUS ALGORITHMS
  • 45. Raft: Leader Appends Entry to Log YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd S0 CONSENSUS ALGORITHMS
  • 46. Raft: Leader Sends AppendEntryRPC YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd S0 AppendEntryRPC CONSENSUS ALGORITHMS
  • 47. Raft: Followers Send ACKs YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd cmd cmd cmd S0 ACK CONSENSUS ALGORITHMS
  • 48. Raft: Replies Client YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd cmd cmd S0 If majority:  - Entry Committed  - Execute  - Return result cmd CONSENSUS ALGORITHMS
  • 49. Raft: Notifies Followers of Committed Entry YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd cmd cmd cmd S0 CONSENSUS ALGORITHMS
  • 50. Raft: Not Majority? YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd S0 ACK If not majority:  - Leader retries until succeed CONSENSUS ALGORITHMS
  • 51. Raft: Log Entry YIFAN XING - 2018 @yifan_xing_e 29 Term:  - Current Term when receive cmd Command:  - cmd to execute 1 cmd S0 Index CONSENSUS ALGORITHMS
  • 52. Raft: Consistency Server crushes => log inconsistency Goal: log consistency. But how? YIFAN XING - 2018 @yifan_xing_e 30 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S0 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S2 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S1 CONSENSUS ALGORITHMS
  • 53. Log Consistency Always trust leader’s log No “holes” in log Repair inconsistency during log replication process If a given entry is committed, then all preceding entries are also committed YIFAN XING - 2018 @yifan_xing_e 31 CONSENSUS ALGORITHMS
  • 54. Log Matching Property If both index and term match:     1. the two entries store the same cmd     2. all previous entries are identical For each log entry:     compare index and term YIFAN XING - 2018 @yifan_xing_e 32 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 S1 1 cmd 1 cmd 2 cmd 3 xxx 3 xxxS2 CONSENSUS ALGORITHMS 1 2 3 4 5
  • 55. Log Replication: Ensure Consistency Check for consistency when sending AppendEntriesRPC YIFAN XING - 2018 @yifan_xing_e 33 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 cmd 1 cmd 2 cmd 1 2 3 4 5 2 cmd 2 cmd S0 CONSENSUS ALGORITHMS
  • 56. Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 34 1 cmd 1 cmd 2 cmd 2 cmd 2 cmdS0 1 cmd 1 cmd 2 cmd 2 cmdS1 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 2 3 4 5 Receiver checks its own preceding index and term CONSENSUS ALGORITHMS Log Replication: Ensure Consistency
  • 57. Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 34 1 cmd 1 cmd 2 cmd 2 cmd 2 cmdS0 1 cmd 1 cmd 2 cmd 2 cmd 2 cmdS1 If match => append entry else => rejects request 1 2 3 4 5 CONSENSUS ALGORITHMS Log Replication: Ensure Consistency
  • 58. Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 34 1 cmd 1 cmd 1 cmd 2 cmd 2 cmdS0 1 cmd 1 cmd 1 cmd 1 cmdS1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Ensure Consistency
  • 59. Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 34 1 cmd 1 cmd 1 cmd 2 cmd 2 cmdS0 1 cmd 1 cmd 1 cmd 1 cmdS1 1 2 3 4 5 Do not match: REJECT CONSENSUS ALGORITHMS Log Replication: Ensure Consistency
  • 60. Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 34 1 cmd 1 cmd 1 cmd 2 cmd 2 cmdS0 1 cmd 1 cmd 1 cmd 1 cmdS1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Ensure Consistency
  • 61. Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 34 1 cmd 1 cmd 1 cmd 2 cmd 2 cmdS0 1 cmd 1 cmd 1 cmdS1 2 cmd 2 cmd 1 2 3 4 5 Match: Replicate Log Entry CONSENSUS ALGORITHMS Log Replication: Ensure Consistency
  • 62. Leader's Log Completeness How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 If an entry is committed, all leaders (current + future) must store the entry CONSENSUS ALGORITHMS
  • 63. Leader's Log Completeness How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 Check candidate (potential leader)’s log completeness Servers with incomplete logs => will not be elected CONSENSUS ALGORITHMS
  • 64. Leader's Log Completeness How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 RequestVoteRPC: term - candidate’s term candidateId - candidate requesting vote lastLogIndex - index of candidate’s last log entry lastLogTerm - term of candidate’s last log entry CONSENSUS ALGORITHMS
  • 65. (In)complete Log Example - 1 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine CONSENSUS ALGORITHMS
  • 66. (In)complete Log Example - 1 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine REJECT VOTE Candidate.logTerm < my.logTerm => my log is more complete CONSENSUS ALGORITHMS
  • 67. (In)complete Log Example - 2 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS
  • 68. (In)complete Log Example - 2 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine REJECT VOTE Candidate.lastLogIndex < my.lastLogIndex => my log is longer CONSENSUS ALGORITHMS
  • 69. (In)complete Log Example - 3 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS
  • 70. (In)complete Log Example - 3 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine ACCEPT VOTE REQUEST candidate.term >= my.term && candidate.lastLogIdx < my.lastLogIdx => complete CONSENSUS ALGORITHMS
  • 71. Problems YIFAN XING - 2018 @yifan_xing_e CONSENSUS ALGORITHMS
  • 72. Messages YIFAN XING - 2018 @yifan_xing_e 37 Rely on messages High overhead Messages: Duplicate Out of order Lost Network Latency CONSENSUS ALGORITHMS
  • 73. Failures YIFAN XING - 2018 @yifan_xing_e 38 Failures: Availability Leader failure Follower(s) failure Partition CONSENSUS ALGORITHMS
  • 74. Security YIFAN XING - 2018 @yifan_xing_e 39 Confidentiality (who can read/ write?) CONSENSUS ALGORITHMS
  • 75. Test YIFAN XING - 2018 @yifan_xing_e 40 Reliable Test and debugging is hard CONSENSUS ALGORITHMS
  • 76. Server problems: e.g. Crushloop Application-level problem: e.g. Corrupted Data Other YIFAN XING - 2018 @yifan_xing_e 41 CONSENSUS ALGORITHMS
  • 77. Issues YIFAN XING - 2018 @yifan_xing_e 42 Partitions Network delays Packet loss Duplication Ordering CONSENSUS ALGORITHMS
  • 78. YIFAN XING - 2018 @yifan_xing_e 43 Byzantine Generals Problem CONSENSUS ALGORITHMS
  • 79. YIFAN XING - 2018 @yifan_xing_e 44 Byzantine Failure CONSENSUS ALGORITHMS
  • 80. YIFAN XING - 2018 @yifan_xing_e 44 Byzantine Failure CONSENSUS ALGORITHMS
  • 81. YIFAN XING - 2018 @yifan_xing_e 44 Byzantine Failure CONSENSUS ALGORITHMS
  • 82. Malicious & Misbehaving Peers YIFAN XING - 2018 @yifan_xing_e 45 Not resilient against malicious or misbehaving peers. May not be capable of destroying the whole consensus protocol Abused: Confidentiality? (only intended parties can read) Integrity? (messages are authentic) Correctness is guaranteed ONLY if each peer adheres to the protocol requirements. Corrupting the application-level values Injecting malicious values CONSENSUS ALGORITHMS
  • 83. YIFAN XING - 2018 @yifan_xing_e 46 Byzantine Fault Tolerant Algorithms CONSENSUS ALGORITHMS Byzantine Paxos algorithms Byzantine Raft algorithms
  • 84. YIFAN XING - 2018 @yifan_xing_e 46 Byzantine Fault Tolerant Algorithms CONSENSUS ALGORITHMS Byzantine Paxos algorithms Byzantine Raft algorithms
  • 85. Implementing Distributed Systems Protocols/ Consensus Algorithms: Understandability Consistency Correctness, etc. Don't Resilient against issues? Worth it to be resilient? Take Away YIFAN XING - 2018 @yifan_xing_e 47 CONSENSUS ALGORITHMS Reliability Complexity Designing Distributed Systems Protocols/ Consensus Algorithms:
  • 86. T H A N K Y O U @yifan_xing_e