How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks

How SolrCloud Solved
Recovery Issues
Cao Manh Dat
Lucene/Solr Committer, Lucidworks
@caomanhdat
#Activate18 #ActivateSearch

Agenda
• Basic of SolrCloud Indexing
• How Solr used to deal with indexing failures
• The new design
• Q&A

Basic of SolrCloud Indexing
L
ZK
R1
R2
Shard 1

L
ZK
R1
R2
Shard 1
update: u1

L
ZK
R1
R2
Shard 1
update: u1
u1
u1

L
ZK
R1
R2
Shard 1
update: u1
success
success

L
ZK
R1
R2
Shard 1
success
success
success

Index failure on replica side
L
ZK
R1
R2
Shard 1
no response
success
• Unavoidable
• Connection issues
• Replica's node met tragic events
query

Agenda
• Basic of SolrCloud Indexing
• How Solr used to deal with index failures
• The new design
• Q&A

State of Replica
RECOVERING
ACTIVE
DOWN
SHUTDOWN
SHUTDO
W
N
DO
REC
FIN
REC
DO REC
SKIP
REC

How Solr used to deal with index failures
Idea (SOLR-5495): LIR process
1. Leader publish replica's state to DOWN
2. Leader requests replica to do recovery
3. Replica does recovery
A. Replica publish its state to
RECOVERING
B. Sync with leader
C. Replica publish its state to ACTIVE
L R
ZK
1
2
3a, 3c
3b

When replica's state is not enough
When a replica is out-of-sync
• It should not become leader
• It should not becomes ACTIVE without acknowledging LIR process
• It should not skip recovery
Additional Flag : LIR State {ACTIVE, RECOVERING, DOWN}
• Both leader and replica can change it
• A replica has 9 different states in total

A failure case of the old design
Outcome: A replica stay in
DOWN state forever LR ZK
START
update
LIRRecovery on
STARTUP
publish DOWN
wait leader to see
RECOVERING state
Timeout
R's state = REC
R's state = DOWN
wait to see
R's state is
RECOVERING
hmm, nope!
failed to send
an update

Cons of the old design
• Replica states are shared resources
• LIR states are shared resources
• Unable to prove its correctness
• Not being able to solve all kind of failures

The new design (SOLR-11702)
• Each replica will have an associated term (a positive number)
• The term terminology is borrowed from the Raft paper
(https://goo.gl/9UaURg)
• Terms of all replicas of a shard are stored in ZK
• Path : /collections/collection1/terms/shard1
• Val : {"core1" : 2, "core2" : 2, "core3" : 0}
• Only replicas with highest term can become leader

Operations for changing terms
• Op1 : A replica set its term equals to its leader
• Op2 : A leader increase its term and some other replica terms
by 1
• from : {"core1" : 2, "core2" : 2, "core3" : 2}
• to : {"core1" : 3, "core2" : 3, "core3" : 2}
• Term can only be monotonic increased

Rules
• Leader only forwards updates to replicas with terms equal to
its term
• {"core1" : 3, "core2" : 3, "core3" : 2}
• Replica will watch the term values node and start recovery
process whenever its term less than its leader

How to deal with index failures
1. Leader (L) increase its term and other
replicas succeeded on responding to
the update by 1
• from : {"L":1, "R1":1, "R2":1}
• to : {"L":2, "R1":1, "R2":2}
2. Replica (R1) watch the ZK and get
notified that it needs to do recovery
• Replica sets it term equal to leader
then do recovery
• {"L":2, "R1":2, "R1_recovering":1, "R2":2}
L R1
ZK
1 2

Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
R1
R3R2

leader
2. R2 and R3 go DOWN
R1
R3R2

leader
3. R1 receives updates u1, u2
R1
R3R2
u1,u2

leader
4. R1 goes DOWN
5. R2 and R3 comes back
R1
u1,u2
R3R2

leader
4. R1 goes DOWN
6. R2 or R3 become leader without
having u1, u2
R3R2
R1
u1,u2

How new design solved consistency
problem
leader
R1
R3R2
{R1:1, R2:1, R3:1}

problem
leader
2. R2 and R3 go DOWN R1
R3R2
{R1:1, R2:1, R3:1}

problem
leader
R1
R3R2
u1,u2
{R1:2, R2:1, R3:1}

problem
leader
4. R1 goes DOWN
6. R2 and R3 can't become leader
since their terms is not highest
R1
u1,u2
R3R2
{R1:2, R2:1, R3:1}

Pros of the new design
• Proof of correctness
• Term of replicas are a great hint for leader election
• {"core1" : 1, "core2" : 2, "core3" : 0}
• No need for direct connection between leader and replica
• Only replica update its state
• Solved long-standing issues
• Replica stays in DOWN state forever
• Leaderless shard
• Design document : https://goo.gl/ueSLFT

Note for system administrators
• New LIR design is introduced since Solr 7.3
• Leader in 7.3 will
• use old LIR process for replicas running 7.2 or previous
versions
• use new LIR process for replicas running 7.3 or after
versions
• The backward-compatibility support will be removed since Solr 8.0
• Leader in 8.0 can only use new LIR process
• Leverage the new design in leader election

How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks

Similar to How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks (19)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks