[Dr. TLA+ Series] Raft - Jin Li

580 views

Published on

[video] https://www.youtube.com/watch?v=6Kwx8zfGW0Y

In this talk, we will discuss Raft and its TLA+ spec. Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos. The design of Raft separates key elements of consensus algorithm, such as leader election, log replication, etc.., which results in the Raft more understandable and implementable. Raft has been widely taught and implemented, with a partial list of implementation available at https://raft.github.io/.

Paper and Spec (not required, but helpful to take a look before the lecture)
- In Search of an Understandable Consensus Algorithm (https://raft.github.io/raft.pdf)
- TLA+ specification for the Raft consensus algorithm (https://github.com/ongardie/raft.tla)

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
580
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

[Dr. TLA+ Series] Raft - Jin Li

  1. 1. RAFT Paxos Concept Replicated Log for Replicated State Machine Consensus Algorithm Leader Election Key (First elected leader, then normal operation) Optional (A form of operation) Log maintenance Leader’s Log is replicated to all Followers Node may log consensus operation, but it may have holes TLA+ Spec Close to developer’s implementation & usage More abstract State Space Large Small
  2. 2. a b c d e f a b c e f g h i g
  3. 3. unique
  4. 4. S1: 1 S2: 1 S3: 1
  5. 5. S1: 2 S2: 1 S3: 1
  6. 6. S1: 2 S2: 1 S3: 1 RequestVoteRequest
  7. 7. S1: 2 S2: 2 S3: 2 RequestVoteRespose
  8. 8. S1: 2 S2: 2 S3: 2 1 2 a 2 2 b 3 2 c
  9. 9. S1: 2 S2: 2 S3: 2 1 2 a 2 2 b 3 2 c 4 2 d 5 2 e 6 2 f 1 2 a 2 2 b 1 2 a Committed Index
  10. 10. S1: 2 S2: 2 S3: 2 1 2 a 2 2 b 3 2 c 4 2 d 5 2 e 6 2 f 1 2 a 2 2 b 3 2 c 1 2 a 2 2 b 3 2 c 4 2 d Committed Index
  11. 11. one server per term
  12. 12. persisted
  13. 13. AppendEntries AppendEntries
  14. 14. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Leader S1: 2 Follower S2: 2 Follower S3: 2 Follower S4: 2 Follower S5: 2
  15. 15. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Crash S1: 2 Crash S2: 2 Follower S3: 3 Leader S4: 3 Follower S5: 3 3 3 c 4 3 d 3 3 c 4 3 d
  16. 16. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Follower S1: 4 Crash S2: 2 Crash S3: 3 Follower S4: 4 Leader S5: 4 3 3 c 4 3 d 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g
  17. 17. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Follower S1: 5 Follower S2: 5 Leader S3: 5 Crash S4: 4 Crash S5: 4 3 3 c 4 3 d 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g 3 5 h 3 5 h 3 5 h 4 5 i
  18. 18. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Leader S1: 6 Follower S2: 6 Crash S3: 5 Crash S4: 4 Follower S5: 6 3 3 c 4 3 d 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g 3 5 h 3 5 h 3 5 h 4 5 i 4 6 j 5 6 k 6 6 l 4 6 j
  19. 19. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Crash S1: 6 Follower S2: 6 Follower S3: 5 Follower S4: 4 Follower S5: 6 3 3 c 4 3 d 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g 3 5 h 3 5 h 3 5 h 4 5 i 4 6 j 5 6 k 6 6 l 4 6 j
  20. 20. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Crash S1: 6 Crash S2: 6 Follower S3: 5 Follower S4: 4 Follower S5: 6 3 3 c 4 3 d 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g 3 5 h 3 5 h 3 5 h 4 5 i 4 6 j 5 6 k 6 6 l 4 6 j
  21. 21. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Crash S1: 6 Crash S2: 6 Leader S3: 8 Follower S4: 8 Follower S5: 8 3 3 c 4 3 d 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g 3 5 h 3 5 h 3 5 h 4 5 i 4 6 j 5 6 k 6 6 l 4 6 j
  22. 22. 1 2 a 2 2 b 1 2 a 2 2 b 1 2 a 2 2 a 1 2 a 2 2 b 1 2 a 2 2 b Committed Index Crash S1: 6 Leader S2: 7 Follower S3: 7 Follower S4: 7 Follower S5: 7 3 3 c 4 3 d 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g 3 5 h 3 5 h 3 5 h 4 5 i 4 6 j 5 6 k 6 6 l 4 6 j 5 7 m
  23. 23. 5 7 m 1 2 a 2 2 b 3 5 h 4 6 j 5 7 m 4 6 1 2 a 2 2 b Follower S5: 7 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g Leader S2: 7
  24. 24. 5 7 m 1 2 a 2 2 b 3 5 h 4 6 j 5 7 m 1 2 a 2 2 b Follower S5: 7 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g Leader S2: 7 4 6 j 3 5
  25. 25. 5 7 m 1 2 a 2 2 b 3 5 h 4 6 j 5 7 m 1 2 a 2 2 b Follower S5: 7 3 3 c 4 3 d 5 4 e 6 4 f 7 4 g Leader S2: 7 4 6 j 3 5 h 2 2
  26. 26. 5 7 m 1 2 a 2 2 b 3 5 h 4 6 j 5 7 m 1 2 a 2 2 b Follower S5: 7 Leader S2: 7 4 6 j 3 5 h 2 2 3 5 h 4 6 j 5 7 m
  27. 27. unique id
  28. 28. S1 S2 S3 S4 S5 S3, S4, S4 can be elected as new Leader S1, S2 can be elected as old Leader
  29. 29. S1 S2 S3 S4 S5 Configuration Change to 3+5 Change configuration to 5
  30. 30. S1 S2 S3 S4 S5 Change Configuration 3+5 Change Configuration to 3 Step Down
  31. 31. • Server Variable currentTerm: (persisted) state: { Follower, Leader, Candidate} votedFor: (persisted) log: (persisted) commitIndex • Candidate Variable votesResponded votesGranted • Leader Variable nextIndex matchIndex 1 1 x  3 2 1 y  1 3 1 y  9 4 2 x  2 5 3 x  0 6 3 y  7 7 3 x  5 8 3 x  4
  32. 32. Follower Leader Candidate Init Timeout, Start election Timeout, New election BecomeLeaderDiscover leader with Higher term Discover leader with Higher term Restart RequestVote ClientRequest AdvanceCommitIndex AppendEntriesHandleRequestVoteRequest HandleRequestVoteResponse HandleAppendEntriesRequest HandleAppendEntriesResponse DropStaleResponse DuplicateMessage DropMessage

×