Cross Datacenter Replication in Apache Solr 6

Cross Datacenter Replication in Apache Solr 6
Shalin Shekhar Mangar
Lucidworks Inc.
@shalinmangar

The standard
for enterprise
search.
of Fortune 500
uses Solr.
90%

Agenda
• Review a typical Solr deployment architecture
• Challenges of running a Solr deployment across data centers
• Cross Data Centre Replication (CDCR) in Solr
• Setup and configuration
• Limitations
• Alternative strategies
• Future work

Client ClientClient
Solr
Zookeeper
Datacenter

CDCR Anti-patterns - Remote Solr instances
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C

CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C

CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
DC 3

Why not a single Solr Cloud?
• Same update is transferred to each replica
• Synchronous indexing means burst-indexing is constrained by cross
DC bandwidth
• Increased latency for indexing operations
• Need a ZooKeeper node in a 3rd DC to break ties
• Search requests are not DC-aware, may choose a remote replica

Cross Datacenter Replication in Solr
• Let’s call it CDCR for short
• Accommodate two or more data centres
• Active/passive setup for disaster recovery
• Support limited bandwidth links
• Eventually consistent passive cluster

Source: http://yonik.com/solr-cross-data-center-replication/

CDCR in Solr 6
• Scalable: no SPoF and/or bottleneck
• Peer cluster can have a different replication factor
• Asynchronous updates; no penalty for indexing
• Push operations for low latency replication
• Low overhead — uses existing transaction logs and indexes
• Leader-to-leader communication ensures update is sent only once
to peer cluster

Target Cluster
Tune replication
Synchronize logs
CdcrUpdateLog

Enable APIs
Update chains
Update chains
Update log

CDCR APIs
• http://host:port/solr/collection_name/cdcr?action=START
• Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS
• Monitoring APIs: QUEUES, OPS, ERRORS

How to failover?
• Change configuration on target to make it the source
• Point indexers to the new target
• Change configuration on source to make it the new target
• May require stopping indexing during the conversion process —
especially if you want to revert the change

CDCR support in Solr 6+
• Active/passive setup either for disaster recovery or for low latency
querying
• Solr clusters with existing data can be converted to a source cluster
from Solr 6.2 onwards
• Low to medium indexing traffic

CDCR Limitations and gotchas
• By default CDCR is disabled — invoke START to enable on both
source and target
• Soft commits are not replicated to target — must schedule
autoSoftCommit explicitly on target
• Different set of configurations required on source and target
• Daisy-chaining is possible but not well tested — add all targets to
the same source cluster

CDCR Limitations and gotchas
• Not suitable for applications requiring high throughput indexing —
some knobs exist for tuning replication speeds
• Update log buffers can grow indefinitely when target clusters are
down — can work around by disabling buffering for the time being
if there is only one target
• No automatic failover between source and target — explicit actions
required to modify configurations and point indexing pipelines to
the new source
• No Active/active setup

Alternative strategy
• Use a proper queue such as Apache Kafka to feed source and target DCs
simultaneously
• Use external versions in conjunction with versions generated by Solr —
DocBasedVersionConstraintsProcessorFactory
• Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale”
by Oliver Bates, Apple Inc. — http://sched.co/8ArU
• Pros: Supports high indexing throughputs and active/active replication
• Cons: Additional systems required, managing consistency is difficult and requires in
depth Solr expertise, all atomic updates must go to a single DC, cannot support
delete-by-query

Problems we solved
• Synchronous indexing to replicas — build separate asynchronous
indexing pipeline
• Limited size of the update log — use update log as the queue
• How to track replication progress to preserve consistency on target
clusters in case the source leader dies — checkpoints
• Bootstrapping target cluster with indexes when update logs are
incomplete
• New replicas on source have no logs to replicate — replicate
update logs during recovery

Future work
• Move configuration out of solrconfig.xml and into API calls
• Dynamically add/remove/change target cluster information
• Cap update log to a max size and fall back to index replication if
necessary
• Refactor and combine CdcrUpdateLog
• Better monitoring: capture transfer rate and latency info
• Add support for rate limiting replication between source and target
• Active/active?

Resources
• CDCR page on ref guide — https://cwiki.apache.org/confluence/
pages/viewpage.action?pageId=62687462
• http://yonik.com/solr-cross-data-center-replication/
• https://cwiki.apache.org/confluence/display/solr/
Updating+Parts+of+Documents

Cross Datacenter Replication in Apache Solr 6

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cross Datacenter Replication in Apache Solr 6

Similar to Cross Datacenter Replication in Apache Solr 6 (20)

More from Shalin Shekhar Mangar

More from Shalin Shekhar Mangar (7)

Recently uploaded

Recently uploaded (20)

Cross Datacenter Replication in Apache Solr 6