SlideShare a Scribd company logo
1 of 38
分布式一致性raft实现原理
- 峰云就她了
- xiaorui.cc
什么是一致性协议 ?
raft有哪些特点 ?
raft vs paxos ?
raft的构成组件及实现原理 ?
各种所谓奇葩的raft场景 ?
如何实现raft ?
介绍
单节点环境
client
server
存在数据一致性问题 ?
多节点环境
node 1
node 3
node 2
那么如何保证数据的一致性 ?
角色
Follower
Candidate
Leader
KeyWorld
定时器
Term 时间片
Term ID
N/2 + 1
Heartbeats
KeyWorld
选举成Leader需提供TermID 和 LogIndex
Leader 绝对不会删除自己的日志
客户端自己携带ID帮助raft保持幂等性
一条记录提交了,那么它之前的记录一定都是
commited.
KeyWorld
节点之间的Term和索引一致, 我们就认为数据是
一致的.
在一个Term里只会有一个Leader
每个Follower只能选一个Leader
KeyWorld
currentTerm
服务器最后一次知道的任期号(初始化为 0,持续递增)
voteFor
在当前获得选票的候选人的 Id
log[]
日志条目集( 状态机指令及TermId )
commitIndex
已知最大的索引值
nextIndex[]
每个follower的下一个索引值
Vote RPC
Term 候选人的任期号
candidateid
ID
lastLogIndex
候选人的最后日志的索引值
lastLogTerm
候选人最后日志的任期号
Term 当前的任期号, 用于领导人去更新自己
voteGranted
True or False
most simple election
vote for me
vote for me
OK !
OK !
C-1
simple election
F-2
F-1
vote for me
vote for me
NO
timer 155
Term 2
Timer 170
Term 3
Condition比Follwer的term id小
不影响 “F” 定时器在转 !
C 已得知情况, 故意Vote超时, 等他人选举
.
Timer 183
Term 3
C-1
simple election
RequestVote(term=2)
voteGranted=true,
term=2
C-2
same term id
wait timeout!
NO ! Term not match
RequestVote(term=2)
hard election -1
vote for me
OK !
vote for me
not
term match
term conflict
not n/2 + 1
OK !
都变为一个term id !
summery election
过程
定时器触发, followers把current_term_id + 1
改变成candidate状态
发送RequestVoteRPC请求
结果
成功选举
别人被选
重新选
Client
Works with leader
Leader return to response when it commits an entry !
Assign uniquqeID to every command , Leader store
latest ID with response.
client process
Only log entry !
1 Hello
2 Raft
1 Hello
2 Raft
1 Hello
2 Raft
Log Replication
默认心跳为 50 ms
默认心跳超时为 300ms
每次心跳的时候做 Log entry  commit
超过 n/2+1 就算成功
Log RPC
Term 领导人的任期号
LeaderID
领导人的 Id,以便于跟随者重定向请求
pervLogIndex
新的日志条目紧随之前的索引值
entries[]
需要存储当然日志条目(表示心跳时为空;一次性发送多个是为了
提高效率)
LeaderCommit
领导人已经提交的日志的索引值
Term 当前的任期号, 用于领导人去更新自己
success
跟随者包含了匹配上 prevLogIndex 和 prevLogTerm 的日志时为真
log replication - 1
Heartbaet & Append Entries1 Hello
1 Hello
1 Hello
Heartbaet & Append Entries
Only log entry !
log replication - 2
OK !
1 Hello
1 Hello
1 Hello
OK !
Leader commit !
Le_1
log replication - 3
F_2
F_1
Heartbaet & commit1 Hello
Heartbaet & commit
1 Hello
1 Hello
Follower commit !
常见疑难杂症
Le_1
if a node reply timeout ?
F_2
F_1
Heartbaet & commit
1 Hello
1 Hello
1 Hellotimeout !!!
F_2 如何保持数据一致性 ? Leader会重试 !
Le_1
Leader crash
F_2
F_1
Log entry Ack
1 Hello
1 Hello
1 Hello
Leader在本地commit后, 发给follower commit 之前crash !
Hello 还在么?
F_3
1 Hello
Le_1
Follower crash
F_2
F_1
prevLogIndex
1 Hello
2 Raft
F_3 crash重新启动后如何平衡数据.
F_3
1 Hello
2 Raft
1 Hello
2 Raft
1 Hello
2
Network Partition
Le_1
正常情况
F_2
F_1
Heartbaet & commit
1 Hello
F_3
F_4
1 Hello
1 Hello
1 Hello
1 Hello
Le_1
网络分区
F_2
F_1
Request Vote
1 Hello
F_3
F_4
1 Hello
1 Hello
1 Hello
1 Hello
两个人怎么够法定人数 ! ! !
Vote Granted
Le_1
新集群正常
F_2
F_1
Heartbeat & Log entry & commit
1 Hello
2 Tim
F_3
F_4
1 Hello
2 Ying
1 Hello
2 Ying
1 Hello
2 Tim
1 Hello
2 Ying
两个人怎么够法定人数 ! ! !
Le_1
网络恢复
F_2
F_1
Heartbeart & Append Log Entries
1 Hello
Le_2
F_4
1 Hello
2 Ying
1 Hello
2 Ying
1 Hello
1 Hello
2 Ying
网络好了后, 开始抢夺Leader
Le_1 term 小于 Le_2 !
一致性
F_2
F_1
Heartbeat & Log entry & commit
Le_2
F_4
1 Hello
2 Ying
1 Hello
2 Ying
1 Hello
2 Ying
F_5
1 Hello
2 Ying
1 Hello
2 Ying
冲突Split brain
如符合法定人数并产生了N条数据 与 新集群怎么保持数据一致性
覆盖 VS 合并 ?
被分区前有些node没有收到commit ?
timer check
预防Split brain
单播制定节点
指定法定人数 , 每次addreduce都需要更改
加大timeout , retry
统一 client 入口 , But …
监控脑裂情况, 反查各个node的leader是否一致
复杂一致性
1 2 3 4 5 6 7 8 9 10
S1 44 44 55 66 77 80 89 90
S2 44 44 55 66 77 80 89
S3 44 44 55 66 77
S4 44 44 55 70 70 85 85
S5 44 44 55 70 70 85
index
Host
term id
每个方格为Log entry
Log compress
1 2 3 4 5 6 7 8 9 10
S1 44 44 55 66 77 80 89 90
index
Snapshot
Last included index : 6
Last included term : 80
state macheie state:
x <— 0
y <— 9
all commited !!!
study
动画演示:
https://ongardie.github.io/raft-talk-archive/2015/buildstuff/raftscope-replay/
文档:
http://en.youscribe.com/catalogue/tous/professional-resources/it-systems/raft-
in-search-of-an-understandable-consensus-algorithm-2088704
Googole …
Q & A

More Related Content

Viewers also liked

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 

Viewers also liked (18)

OpenDaylight MD-SAL Clustering Explained
OpenDaylight MD-SAL Clustering ExplainedOpenDaylight MD-SAL Clustering Explained
OpenDaylight MD-SAL Clustering Explained
 
Redis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HARedis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HA
 
Sales presentations
Sales presentationsSales presentations
Sales presentations
 
OpenDaylight OpenFlow clustering
OpenDaylight OpenFlow clusteringOpenDaylight OpenFlow clustering
OpenDaylight OpenFlow clustering
 
Redis & Redis HA design with Keepalived
Redis & Redis HA design with KeepalivedRedis & Redis HA design with Keepalived
Redis & Redis HA design with Keepalived
 
Redis on AWS
Redis on AWSRedis on AWS
Redis on AWS
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
 
Redis trouble shooting
Redis trouble shootingRedis trouble shooting
Redis trouble shooting
 
Redis acc 2015_eng
Redis acc 2015_engRedis acc 2015_eng
Redis acc 2015_eng
 
Redis to the Rescue?
Redis to the Rescue?Redis to the Rescue?
Redis to the Rescue?
 
High-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using RedisHigh-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using Redis
 
Internet scaleservice
Internet scaleserviceInternet scaleservice
Internet scaleservice
 
Redis acc
Redis accRedis acc
Redis acc
 
Redis for the Everyday Developer
Redis for the Everyday DeveloperRedis for the Everyday Developer
Redis for the Everyday Developer
 
Differentiated Instruction Strategy Raft
Differentiated Instruction Strategy RaftDifferentiated Instruction Strategy Raft
Differentiated Instruction Strategy Raft
 
Redis in Practice
Redis in PracticeRedis in Practice
Redis in Practice
 
The Etsy Shard Architecture: Starts With S and Ends With Hard
The Etsy Shard Architecture: Starts With S and Ends With HardThe Etsy Shard Architecture: Starts With S and Ends With Hard
The Etsy Shard Architecture: Starts With S and Ends With Hard
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 

More from rfyiamcool (10)

Redis cluster那些事儿
Redis cluster那些事儿Redis cluster那些事儿
Redis cluster那些事儿
 
Golang advance
Golang advanceGolang advance
Golang advance
 
Golang 高性能实战
Golang 高性能实战Golang 高性能实战
Golang 高性能实战
 
Mysql fast share
Mysql fast shareMysql fast share
Mysql fast share
 
python高级内存管理
python高级内存管理python高级内存管理
python高级内存管理
 
Micro service
Micro serviceMicro service
Micro service
 
python gil
python gilpython gil
python gil
 
async io frame
async io frameasync io frame
async io frame
 
异步io框架的实现
异步io框架的实现异步io框架的实现
异步io框架的实现
 
美妙的多进程管理
美妙的多进程管理美妙的多进程管理
美妙的多进程管理
 

Raft