11. How Solr used to deal with index failures
Idea (SOLR-5495): LIR process
1. Leader publish replica's state to DOWN
2. Leader requests replica to do recovery
3. Replica does recovery
A. Replica publish its state to
RECOVERING
B. Sync with leader
C. Replica publish its state to ACTIVE
L R
ZK
1
2
3a, 3c
3b
12. When replica's state is not enough
When a replica is out-of-sync
• It should not become leader
• It should not becomes ACTIVE without acknowledging LIR process
• It should not skip recovery
Additional Flag : LIR State {ACTIVE, RECOVERING, DOWN}
• Both leader and replica can change it
• A replica has 9 different states in total
13. A failure case of the old design
Outcome: A replica stay in
DOWN state forever LR ZK
START
update
LIRRecovery on
STARTUP
publish DOWN
wait leader to see
RECOVERING state
Timeout
R's state = REC
R's state = DOWN
wait to see
R's state is
RECOVERING
hmm, nope!
failed to send
an update
14. Cons of the old design
• Replica states are shared resources
• LIR states are shared resources
• Unable to prove its correctness
• Not being able to solve all kind of failures
15. Agenda
• Basic of SolrCloud Indexing
• How Solr used to deal with index failures
• The new design
• Q&A
16. The new design (SOLR-11702)
• Each replica will have an associated term (a positive number)
• The term terminology is borrowed from the Raft paper
(https://goo.gl/9UaURg)
• Terms of all replicas of a shard are stored in ZK
• Path : /collections/collection1/terms/shard1
• Val : {"core1" : 2, "core2" : 2, "core3" : 0}
• Only replicas with highest term can become leader
17. Operations for changing terms
• Op1 : A replica set its term equals to its leader
• Op2 : A leader increase its term and some other replica terms
by 1
• from : {"core1" : 2, "core2" : 2, "core3" : 2}
• to : {"core1" : 3, "core2" : 3, "core3" : 2}
• Term can only be monotonic increased
18. Rules
• Leader only forwards updates to replicas with terms equal to
its term
• {"core1" : 3, "core2" : 3, "core3" : 2}
• Replica will watch the term values node and start recovery
process whenever its term less than its leader
19. How to deal with index failures
1. Leader (L) increase its term and other
replicas succeeded on responding to
the update by 1
• from : {"L":1, "R1":1, "R2":1}
• to : {"L":2, "R1":1, "R2":2}
2. Replica (R1) watch the ZK and get
notified that it needs to do recovery
• Replica sets it term equal to leader
then do recovery
• {"L":2, "R1":2, "R1_recovering":1, "R2":2}
L R1
ZK
1 2
21. Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
R1
R3R2
22. Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
R1
R3R2
u1,u2
23. Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
4. R1 goes DOWN
5. R2 and R3 comes back
R1
u1,u2
R3R2
24. Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
4. R1 goes DOWN
5. R2 and R3 comes back
6. R2 or R3 become leader without
having u1, u2
R3R2
R1
u1,u2
25. How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
R1
R3R2
{R1:1, R2:1, R3:1}
26. How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN R1
R3R2
{R1:1, R2:1, R3:1}
27. How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
R1
R3R2
u1,u2
{R1:2, R2:1, R3:1}
28. How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
4. R1 goes DOWN
5. R2 and R3 comes back
6. R2 and R3 can't become leader
since their terms is not highest
R1
u1,u2
R3R2
{R1:2, R2:1, R3:1}
29. Pros of the new design
• Proof of correctness
• Term of replicas are a great hint for leader election
• {"core1" : 1, "core2" : 2, "core3" : 0}
• No need for direct connection between leader and replica
• Only replica update its state
• Solved long-standing issues
• Replica stays in DOWN state forever
• Leaderless shard
• Design document : https://goo.gl/ueSLFT
30. Note for system administrators
• New LIR design is introduced since Solr 7.3
• Leader in 7.3 will
• use old LIR process for replicas running 7.2 or previous
versions
• use new LIR process for replicas running 7.3 or after
versions
• The backward-compatibility support will be removed since Solr 8.0
• Leader in 8.0 can only use new LIR process
• Leverage the new design in leader election