3. Distributed System
YIFAN XING - 2018
What is a Distributed System?
- Components
- Located on different networked computers
- Communicate and coordinate
- Passing messages to each other
@yifan_xing_e
2
CONSENSUS ALGORITHMS
5. Fun Fact
YIFAN XING - 2018
American Airline
Central Office
Travel Agents
1920s
Cards for each flight
Mark seats sold on cards
@yifan_xing_e
4
CONSENSUS ALGORITHMS
6. Significant challenges/ characteristics of distributed systems are:
Lack of global Knowledge
Time
Concurrency/ Security
Failures
Challenges
5
YIFAN XING - 2018 @yifan_xing_e
7. Challenge 1: Lack of Global Knowledge
YIFAN XING - 2018
No host has global knowledge
How to exchange StateMachines' state information?
Information up-to-date?
How to detect inconsistency?
@yifan_xing_e
6
CONSENSUS ALGORITHMS
9. Challenge 3: Consistency/ Security
YIFAN XING - 2018
Often will have concurrent operations on a single object
How to ensure object is in consistent state?
Conflicts
@yifan_xing_e
8
CONSENSUS ALGORITHMS
10. Challenge 4: Failures
YIFAN XING - 2018
Can affect each other
How to tolerate failures of components?
Tolerate: detect, handle, recover
@yifan_xing_e
9
A distributed system is one in which the failure of
a computer you didn't even know existed can
render your own computer unusable.
— Leslie Lamport
CONSENSUS ALGORITHMS
11. Achieve consensus between machines
Fault-tolerance => integrity, availability, etc.
Reliably process and execute client commands
Consensus Algorithms
10
YIFAN XING - 2018 @yifan_xing_e
Raft Paxos
15. Basic Paxos
YIFAN XING - 2018
Proposer: Propose a value
Prepare: Try to propose
Acceptor: accept/ reject value
@yifan_xing_e
13
Consensus: agree on one value
CONSENSUS ALGORITHMS
16. Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposer
Acceptor
Choose a
Proposal
Number (n)
CONSENSUS ALGORITHMS
17. Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Choose a
Proposal
Number (n)
Broadcast
the number
to all servers
Prepare(n)
CONSENSUS ALGORITHMS
18. Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Choose a
Proposal
Number (n)
Broadcast
the number
to all servers
Prepare(n)
If n > maxProposal:
maxProposal = n
promise
CONSENSUS ALGORITHMS
Respond Promise
Won't accept proposal with
n' < n
19. Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Choose a
Proposal
Number (n)
Broadcast
the number
to all servers
Prepare(n)
Respond Promise
Won't accept proposal with
n' < n
If majority
Accept(n, )
CONSENSUS ALGORITHMS
21. Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Broadcast
Accept(n, value)
Respond
If n >= maxProposal:
acceptedProposal = maxProposal = n
acceptedValue = value
CONSENSUS ALGORITHMS
22. Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Broadcast
Accept(n, value)
Respond
If n >= maxProposal:
acceptedProposal = maxProposal = n
acceptedValue = value
If majority:
if any rejection => n is not largest
repeat from beginning
else:
value chosen
CONSENSUS ALGORITHMS
23. Basic Paxos
YIFAN XING - 2018 @yifan_xing_e
14
Proposor
Acceptor
Broadcast
Accept(n, value)
Respond
If n >= maxProposal:
acceptedProposal = maxProposal = n
acceptedValue = value
If majority:
if any rejection => n is not largest
repeat from beginning
else:
value chosen
CONSENSUS ALGORITHMS
24. Paxos: Proposal Number
YIFAN XING - 2018 @yifan_xing_e
15
S0
- server id: unique
- round number:
increment overtime, shared among all servers
servers keep track of highest round number from msgs
Generate new proposal number:
increment maxRound
concatenate with server id
CONSENSUS ALGORITHMS
25. Multi-Paxos
YIFAN XING - 2018 @yifan_xing_e
16
CONSENSUS ALGORITHMS
A sequence of instances of Basic Paxos
Each instances of Basic Paxos is a log entry
Implementation of Basic Paxos Concepts
30. Paxos: Leader Vs. NonLeader
YIFAN XING - 2018 @yifan_xing_e
18
S0
If didn't receive Heartbeat from a Higher ID for >= 2T ms:
act as leader
act as proposer
Server with highest ID
Heartbeat for every T ms
Accept requests from client
Leader/ Distinguished Proposer:
CONSENSUS ALGORITHMS
31. Paxos: Leader Vs. NonLeader
YIFAN XING - 2018 @yifan_xing_e
18
S0
Server with highest ID
Heartbeat for every T ms
Accept requests from client
Non-leader:
Redirect client requests to leader
act as acceptor
Leader/ Proposer:
CONSENSUS ALGORITHMS
32. Paxos: Leader
YIFAN XING - 2018 @yifan_xing_e
19
Unlikely to have two leaders at the same time
Can handle multiple leaders, however, it won’t work as
efficient because of conflicts
CONSENSUS ALGORITHMS
35. Raft: Background
YIFAN XING - 2018
Simpler version of Paxos
Equivalent: performance & fault-tolerance
Consistency, Conciseness, Correctness
Why understandability: implemented -> useful, extended/ adapted
to the environment, esp. In DS
Understandability
@yifan_xing_e
21
Designed by Diego Ongaro and John Ousterhout at Stanford
CONSENSUS ALGORITHMS
37. Raft: Phases
YIFAN XING - 2018
1. Select one machine to be a leader
2. Detect crashes, reelection
@yifan_xing_e
23
Leader Election
CONSENSUS ALGORITHMS
38. Raft: Phases
YIFAN XING - 2018
1. Select one machine to be a leader
2. Detect crashes, reelection
1. Leader processes commands from clients
2. Replicates logs (consistency and consensus among servers)
@yifan_xing_e
23
Leader Election
Log Replication
CONSENSUS ALGORITHMS
39. Raft: Phases
YIFAN XING - 2018
1. Select one machine to be a leader
2. Detect crashes, reelection
1. Leader processes commands from clients
2. Replicates logs (consistency and consensus among servers)
Keeps log consistent
Only servers with up-to-date logs can be leader
@yifan_xing_e
Leader Election
Log Replication
Safety
CONSENSUS ALGORITHMS
23
40. Raft: Leader Election
YIFAN XING - 2018 @yifan_xing_e
24
Become Candidate
CurrentTerm++
Vote for itself
Send
RequestVoteRPC
to other servers
Become Leader:
- Send heartbeats
- Handle requests
Become Follower:
- Redirect requests
M
ajority
Votes
Timeout
RPC from
Leader
CONSENSUS ALGORITHMS
41. Raft: How to ensure election works?
YIFAN XING - 2018 @yifan_xing_e
25
At most one leader per term:
- Each server: one vote per term
- Receive majority to win election (N / 2 + 1)
- Example:
S0 S1
S1S0 S0 S0 S1
S2 S3 S4
CONSENSUS ALGORITHMS
42. Raft: How to ensure election works?
YIFAN XING - 2018 @yifan_xing_e
26
At most one leader per term:
- Each server: one vote per term
- Receive majority to win election (N / 2 + 1)
- Example:
S0
Leader
S1 S2 S3 S4
CONSENSUS ALGORITHMS
S1S0 S0 S0 S1
43. Raft: How to ensure election works?
YIFAN XING - 2018 @yifan_xing_e
27
There will eventually be a leader:
- Random election timeout (range 100-300ms)
- Usually, one times out first, and win the majority votes
- If two time out at the same time:
- Split vote -> election timeout -> re-enter election state (increment
term, gather votes)
CONSENSUS ALGORITHMS
53. Log Consistency
Always trust leader’s log
No “holes” in log
Repair inconsistency during log replication process
If a given entry is committed, then all preceding entries are also
committed
YIFAN XING - 2018 @yifan_xing_e
31
CONSENSUS ALGORITHMS
54. Log Matching Property
If both index and term match:
1. the two entries store the same cmd
2. all previous entries are identical
For each log entry:
compare index and term
YIFAN XING - 2018 @yifan_xing_e
32
1
cmd
1
cmd
2
cmd
2
cmd
2
cmd
1
cmd
1
cmd
2
cmd
2
cmd
2
cmd
S0
S1
1
cmd
1
cmd
2
cmd
3
xxx
3
xxxS2
CONSENSUS ALGORITHMS
1 2 3 4 5
62. Leader's Log Completeness
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
35
If an entry is committed, all leaders (current + future) must
store the entry
CONSENSUS ALGORITHMS
63. Leader's Log Completeness
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
35
Check candidate (potential leader)’s log completeness
Servers with incomplete logs => will not be elected
CONSENSUS ALGORITHMS
64. Leader's Log Completeness
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
35
RequestVoteRPC:
term - candidate’s term
candidateId - candidate requesting vote
lastLogIndex - index of candidate’s last log entry
lastLogTerm - term of candidate’s last log entry
CONSENSUS ALGORITHMS
65. (In)complete Log Example - 1
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 2
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 1
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 3
lastLogTerm - 1
Current
StateMachine
CONSENSUS ALGORITHMS
66. (In)complete Log Example - 1
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 2
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 1
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 3
lastLogTerm - 1
Current
StateMachine
REJECT VOTE
Candidate.logTerm < my.logTerm => my log is more complete
CONSENSUS ALGORITHMS
67. (In)complete Log Example - 2
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
CONSENSUS ALGORITHMS
68. (In)complete Log Example - 2
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 3
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
REJECT VOTE
Candidate.lastLogIndex < my.lastLogIndex => my log is longer
CONSENSUS ALGORITHMS
69. (In)complete Log Example - 3
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 5
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
CONSENSUS ALGORITHMS
70. (In)complete Log Example - 3
How to guarantee completeness of leader’s log?
YIFAN XING - 2018 @yifan_xing_e
36
RequestVoteRPC:
term - 3
candidateId - 00001
lastLogIndex - 5
lastLogTerm - 2
Candidate
CurrentStateMachine Info:
term - 3
candidateId - 00002
lastLogIndex - 4
lastLogTerm - 2
Current
StateMachine
ACCEPT VOTE
REQUEST
candidate.term >= my.term && candidate.lastLogIdx < my.lastLogIdx => complete
CONSENSUS ALGORITHMS
72. Messages
YIFAN XING - 2018 @yifan_xing_e
37
Rely on messages
High overhead
Messages:
Duplicate
Out of order
Lost
Network Latency
CONSENSUS ALGORITHMS
82. Malicious & Misbehaving Peers
YIFAN XING - 2018 @yifan_xing_e
45
Not resilient against malicious or misbehaving peers.
May not be capable of destroying the whole consensus protocol
Abused:
Confidentiality? (only intended parties can read)
Integrity? (messages are authentic)
Correctness is guaranteed ONLY if each peer adheres to the
protocol requirements.
Corrupting the application-level values
Injecting malicious values
CONSENSUS ALGORITHMS
85. Implementing Distributed Systems Protocols/ Consensus Algorithms:
Understandability
Consistency
Correctness, etc.
Don't
Resilient against issues?
Worth it to be resilient?
Take Away
YIFAN XING - 2018 @yifan_xing_e
47
CONSENSUS ALGORITHMS
Reliability Complexity
Designing Distributed Systems Protocols/ Consensus Algorithms: