Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cross Datacenter Replication in Apache Solr 6


Published on

Cross Datacenter Replication aka CDCR has been a long requested feature in Apache Solr. In this talk, we will discuss CDCR as released in Apache Solr 6.0 and beyond to understand its use-cases, limitations, setup and performance. We will also take a quick look at the future enhancements that can further simplify and scale this feature.

Published in: Software
  • Be the first to comment

Cross Datacenter Replication in Apache Solr 6

  1. 1. Cross Datacenter Replication in Apache Solr 6 Shalin Shekhar Mangar Lucidworks Inc. @shalinmangar
  2. 2. The standard for enterprise search. of Fortune 500 uses Solr. 90%
  3. 3. Agenda • Review a typical Solr deployment architecture • Challenges of running a Solr deployment across data centers • Cross Data Centre Replication (CDCR) in Solr • Setup and configuration • Limitations • Alternative strategies • Future work
  4. 4. Client ClientClient Solr Zookeeper Datacenter
  5. 5. CDCR Anti-patterns - Remote Solr instances C Solr Zookeeper DC 1 C C DC 2 C C C
  6. 6. CDCR Anti-patterns - Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C
  7. 7. CDCR Anti-patterns - Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C DC 3
  8. 8. Why not a single Solr Cloud? • Same update is transferred to each replica • Synchronous indexing means burst-indexing is constrained by cross DC bandwidth • Increased latency for indexing operations • Need a ZooKeeper node in a 3rd DC to break ties • Search requests are not DC-aware, may choose a remote replica
  9. 9. Cross Datacenter Replication in Solr • Let’s call it CDCR for short • Accommodate two or more data centres • Active/passive setup for disaster recovery • Support limited bandwidth links • Eventually consistent passive cluster
  10. 10. Source:
  11. 11. CDCR in Solr 6 • Scalable: no SPoF and/or bottleneck • Peer cluster can have a different replication factor • Asynchronous updates; no penalty for indexing • Push operations for low latency replication • Low overhead — uses existing transaction logs and indexes • Leader-to-leader communication ensures update is sent only once to peer cluster
  12. 12. Target Cluster Tune replication Synchronize logs CdcrUpdateLog
  13. 13. Enable APIs Update chains Update chains Update log
  14. 14. CDCR APIs • http://host:port/solr/collection_name/cdcr?action=START • Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS • Monitoring APIs: QUEUES, OPS, ERRORS
  15. 15. How to failover? • Change configuration on target to make it the source • Point indexers to the new target • Change configuration on source to make it the new target • May require stopping indexing during the conversion process — especially if you want to revert the change
  16. 16. CDCR support in Solr 6+ • Active/passive setup either for disaster recovery or for low latency querying • Solr clusters with existing data can be converted to a source cluster from Solr 6.2 onwards • Low to medium indexing traffic
  17. 17. CDCR Limitations and gotchas • By default CDCR is disabled — invoke START to enable on both source and target • Soft commits are not replicated to target — must schedule autoSoftCommit explicitly on target • Different set of configurations required on source and target • Daisy-chaining is possible but not well tested — add all targets to the same source cluster
  18. 18. CDCR Limitations and gotchas • Not suitable for applications requiring high throughput indexing — some knobs exist for tuning replication speeds • Update log buffers can grow indefinitely when target clusters are down — can work around by disabling buffering for the time being if there is only one target • No automatic failover between source and target — explicit actions required to modify configurations and point indexing pipelines to the new source • No Active/active setup
  19. 19. Alternative strategy • Use a proper queue such as Apache Kafka to feed source and target DCs simultaneously • Use external versions in conjunction with versions generated by Solr — DocBasedVersionConstraintsProcessorFactory • Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale” by Oliver Bates, Apple Inc. — • Pros: Supports high indexing throughputs and active/active replication • Cons: Additional systems required, managing consistency is difficult and requires in depth Solr expertise, all atomic updates must go to a single DC, cannot support delete-by-query
  20. 20. Problems we solved • Synchronous indexing to replicas — build separate asynchronous indexing pipeline • Limited size of the update log — use update log as the queue • How to track replication progress to preserve consistency on target clusters in case the source leader dies — checkpoints • Bootstrapping target cluster with indexes when update logs are incomplete • New replicas on source have no logs to replicate — replicate update logs during recovery
  21. 21. Future work • Move configuration out of solrconfig.xml and into API calls • Dynamically add/remove/change target cluster information • Cap update log to a max size and fall back to index replication if necessary • Refactor and combine CdcrUpdateLog • Better monitoring: capture transfer rate and latency info • Add support for rate limiting replication between source and target • Active/active?
  22. 22. Resources • CDCR page on ref guide — pages/viewpage.action?pageId=62687462 • • Updating+Parts+of+Documents
  23. 23. Thank you!