Cross Datacenter Replication in Apache Solr 6
Shalin Shekhar Mangar
Lucidworks Inc.
@shalinmangar
The standard
for enterprise
search.
of Fortune 500
uses Solr.
90%
Agenda
• Review a typical Solr deployment architecture
• Challenges of running a Solr deployment across data centers
• Cross Data Centre Replication (CDCR) in Solr
• Setup and configuration
• Limitations
• Alternative strategies
• Future work
Client ClientClient
Solr
Zookeeper
Datacenter
CDCR Anti-patterns - Remote Solr instances
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
DC 3
Why not a single Solr Cloud?
• Same update is transferred to each replica
• Synchronous indexing means burst-indexing is constrained by cross
DC bandwidth
• Increased latency for indexing operations
• Need a ZooKeeper node in a 3rd DC to break ties
• Search requests are not DC-aware, may choose a remote replica
Cross Datacenter Replication in Solr
• Let’s call it CDCR for short
• Accommodate two or more data centres
• Active/passive setup for disaster recovery
• Support limited bandwidth links
• Eventually consistent passive cluster
Source: http://yonik.com/solr-cross-data-center-replication/
CDCR in Solr 6
• Scalable: no SPoF and/or bottleneck
• Peer cluster can have a different replication factor
• Asynchronous updates; no penalty for indexing
• Push operations for low latency replication
• Low overhead — uses existing transaction logs and indexes
• Leader-to-leader communication ensures update is sent only once
to peer cluster
Target Cluster
Tune replication
Synchronize logs
CdcrUpdateLog
Enable APIs
Update chains
Update chains
Update log
CDCR APIs
• http://host:port/solr/collection_name/cdcr?action=START
• Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS
• Monitoring APIs: QUEUES, OPS, ERRORS
How to failover?
• Change configuration on target to make it the source
• Point indexers to the new target
• Change configuration on source to make it the new target
• May require stopping indexing during the conversion process —
especially if you want to revert the change
CDCR support in Solr 6+
• Active/passive setup either for disaster recovery or for low latency
querying
• Solr clusters with existing data can be converted to a source cluster
from Solr 6.2 onwards
• Low to medium indexing traffic
CDCR Limitations and gotchas
• By default CDCR is disabled — invoke START to enable on both
source and target
• Soft commits are not replicated to target — must schedule
autoSoftCommit explicitly on target
• Different set of configurations required on source and target
• Daisy-chaining is possible but not well tested — add all targets to
the same source cluster
CDCR Limitations and gotchas
• Not suitable for applications requiring high throughput indexing —
some knobs exist for tuning replication speeds
• Update log buffers can grow indefinitely when target clusters are
down — can work around by disabling buffering for the time being
if there is only one target
• No automatic failover between source and target — explicit actions
required to modify configurations and point indexing pipelines to
the new source
• No Active/active setup
Alternative strategy
• Use a proper queue such as Apache Kafka to feed source and target DCs
simultaneously
• Use external versions in conjunction with versions generated by Solr —
DocBasedVersionConstraintsProcessorFactory
• Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale”
by Oliver Bates, Apple Inc. — http://sched.co/8ArU
• Pros: Supports high indexing throughputs and active/active replication
• Cons: Additional systems required, managing consistency is difficult and requires in
depth Solr expertise, all atomic updates must go to a single DC, cannot support
delete-by-query
Problems we solved
• Synchronous indexing to replicas — build separate asynchronous
indexing pipeline
• Limited size of the update log — use update log as the queue
• How to track replication progress to preserve consistency on target
clusters in case the source leader dies — checkpoints
• Bootstrapping target cluster with indexes when update logs are
incomplete
• New replicas on source have no logs to replicate — replicate
update logs during recovery
Future work
• Move configuration out of solrconfig.xml and into API calls
• Dynamically add/remove/change target cluster information
• Cap update log to a max size and fall back to index replication if
necessary
• Refactor and combine CdcrUpdateLog
• Better monitoring: capture transfer rate and latency info
• Add support for rate limiting replication between source and target
• Active/active?
Resources
• CDCR page on ref guide — https://cwiki.apache.org/confluence/
pages/viewpage.action?pageId=62687462
• http://yonik.com/solr-cross-data-center-replication/
• https://cwiki.apache.org/confluence/display/solr/
Updating+Parts+of+Documents
Thank you!
shalin@apache.org

Cross Datacenter Replication in Apache Solr 6

  • 2.
    Cross Datacenter Replicationin Apache Solr 6 Shalin Shekhar Mangar Lucidworks Inc. @shalinmangar
  • 3.
    The standard for enterprise search. ofFortune 500 uses Solr. 90%
  • 4.
    Agenda • Review atypical Solr deployment architecture • Challenges of running a Solr deployment across data centers • Cross Data Centre Replication (CDCR) in Solr • Setup and configuration • Limitations • Alternative strategies • Future work
  • 5.
  • 6.
    CDCR Anti-patterns -Remote Solr instances C Solr Zookeeper DC 1 C C DC 2 C C C
  • 7.
    CDCR Anti-patterns -Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C
  • 8.
    CDCR Anti-patterns -Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C DC 3
  • 9.
    Why not asingle Solr Cloud? • Same update is transferred to each replica • Synchronous indexing means burst-indexing is constrained by cross DC bandwidth • Increased latency for indexing operations • Need a ZooKeeper node in a 3rd DC to break ties • Search requests are not DC-aware, may choose a remote replica
  • 10.
    Cross Datacenter Replicationin Solr • Let’s call it CDCR for short • Accommodate two or more data centres • Active/passive setup for disaster recovery • Support limited bandwidth links • Eventually consistent passive cluster
  • 11.
  • 12.
    CDCR in Solr6 • Scalable: no SPoF and/or bottleneck • Peer cluster can have a different replication factor • Asynchronous updates; no penalty for indexing • Push operations for low latency replication • Low overhead — uses existing transaction logs and indexes • Leader-to-leader communication ensures update is sent only once to peer cluster
  • 13.
  • 14.
  • 15.
    CDCR APIs • http://host:port/solr/collection_name/cdcr?action=START •Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS • Monitoring APIs: QUEUES, OPS, ERRORS
  • 16.
    How to failover? •Change configuration on target to make it the source • Point indexers to the new target • Change configuration on source to make it the new target • May require stopping indexing during the conversion process — especially if you want to revert the change
  • 17.
    CDCR support inSolr 6+ • Active/passive setup either for disaster recovery or for low latency querying • Solr clusters with existing data can be converted to a source cluster from Solr 6.2 onwards • Low to medium indexing traffic
  • 18.
    CDCR Limitations andgotchas • By default CDCR is disabled — invoke START to enable on both source and target • Soft commits are not replicated to target — must schedule autoSoftCommit explicitly on target • Different set of configurations required on source and target • Daisy-chaining is possible but not well tested — add all targets to the same source cluster
  • 19.
    CDCR Limitations andgotchas • Not suitable for applications requiring high throughput indexing — some knobs exist for tuning replication speeds • Update log buffers can grow indefinitely when target clusters are down — can work around by disabling buffering for the time being if there is only one target • No automatic failover between source and target — explicit actions required to modify configurations and point indexing pipelines to the new source • No Active/active setup
  • 20.
    Alternative strategy • Usea proper queue such as Apache Kafka to feed source and target DCs simultaneously • Use external versions in conjunction with versions generated by Solr — DocBasedVersionConstraintsProcessorFactory • Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale” by Oliver Bates, Apple Inc. — http://sched.co/8ArU • Pros: Supports high indexing throughputs and active/active replication • Cons: Additional systems required, managing consistency is difficult and requires in depth Solr expertise, all atomic updates must go to a single DC, cannot support delete-by-query
  • 21.
    Problems we solved •Synchronous indexing to replicas — build separate asynchronous indexing pipeline • Limited size of the update log — use update log as the queue • How to track replication progress to preserve consistency on target clusters in case the source leader dies — checkpoints • Bootstrapping target cluster with indexes when update logs are incomplete • New replicas on source have no logs to replicate — replicate update logs during recovery
  • 22.
    Future work • Moveconfiguration out of solrconfig.xml and into API calls • Dynamically add/remove/change target cluster information • Cap update log to a max size and fall back to index replication if necessary • Refactor and combine CdcrUpdateLog • Better monitoring: capture transfer rate and latency info • Add support for rate limiting replication between source and target • Active/active?
  • 23.
    Resources • CDCR pageon ref guide — https://cwiki.apache.org/confluence/ pages/viewpage.action?pageId=62687462 • http://yonik.com/solr-cross-data-center-replication/ • https://cwiki.apache.org/confluence/display/solr/ Updating+Parts+of+Documents
  • 24.