Solr consistency and recovery internals

1© Cloudera, Inc. All rights reserved.
Solr consistency and recovery internals
Mano Kovacs | July 13, 2017

Intro
• Mano Kovacs
• Cloudera Search engineer
• Working on “Why is my Solr cluster down?” mysteries.
• 15 yrs of dev, high-performant web services, IoT platform
• Amature slideshow enthusiast

Agenda
• Consistency basics (leaders/follower)
• Leader election
• When to recover
• General recovery (peersync, replication)
• Recovery in detail
• Leader-Initiated Recovery
• Auto Add Replica

Basics
• Shards in collection
• One leader per shard
• Leader gets writes
• Replicates

Leader Election
• Zookeeper Leader election recipe
• Sequential, ephemeral nodes for each replica
• The order dictates the leader candidates
• First in order becomes leader candidate
• Replicas watch the previous candidate to get notified
• If leader fails, next in line will be the candidate
• Leader candidates follow leader preparation process

Leader Election - leader candidate
• On restart: waits all replicas to participate (default 3 mins)
• Sync changes from other replicas
• Verify last state ACTIVE if not startup
• If all were DOWN, shard hangs (SOLR-7065)
• Verify there was no error reported (LIR… tbd)

What causes Recovery?
• Routine Events
•Add or Move Replica - not having the data
•Restart (upgrade/tuning) - might missed updates
• Not Routine Events
•Server crash
•Leader
•Replica
•Network failure (Lose ZK Connection)
•Replica partitioned: can access ZK, but not the leader

Recovery (from 30k fts.)
• Replaying unfinished updates from tlog
• Check if we are synced
• If no, “How much am I behind?”
• If N (def=100) docs or less
• Retrieving delta
• Else
• Replication: pulling full index
• Go ACTIVE

Recovery (from 1000 fts.)
• Buffering new updates
• So we won’t get behind over and over
again
• Waiting leader to notice us
• Otherwise we don’t get updates
• Replay buffered updates
• Hopefully replay catches up with
incoming updates

Recovery (from 100 fts.)
• Updates are versioned
• Timestamp+counter
• PeerSync: last N updates by version
• Index has fingerprint (hash of doc versions)
• If there is other updates missing,
fingerprint will fail
• Consistency safety net if others fail

Leader-Initiated Recovery
• Partitioning Leader from Replica,
but not ZK
• Leader will send recovery requests
to replica (with retries)
• If Replica went down, it will do
normal recovery process anyway
• If replica is partitioned and up, it
will still serve stale reads :(

LIR problems - SOLR-9555
• Race condition between LIR and
standard Recovery
• Mike Drob’s patch is almost done
• Solves problem with
partitioned replicas too with ZK
watches

AutoAddReplica
• Using shared file system (e.g. HDFS)
• Provides durability
• Instances share index folders
• Move cores to live nodes on failure
• Use same index folder
• Pros
• Durability with rep factor 1
• Handle perm. node loss
• Cons
• Still no HA and read scalability if
using single replica
• Lots of fix from Mark Miller lately

Summary
• Details about SolrCloud cluster
• Help to improve!
• PlantUML is cool to document

Thank you
E: manokovacs@cloudera.com
T: @manokovacs

Solr consistency and recovery internals

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Solr consistency and recovery internals

Similar to Solr consistency and recovery internals (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Solr consistency and recovery internals