SlideShare a Scribd company logo
1 of 24
Download to read offline
Cross Datacenter Replication in Apache Solr 6
Shalin Shekhar Mangar
Lucidworks Inc.
@shalinmangar
The standard
for enterprise
search.
of Fortune 500
uses Solr.
90%
Agenda
• Review a typical Solr deployment architecture
• Challenges of running a Solr deployment across data centers
• Cross Data Centre Replication (CDCR) in Solr
• Setup and configuration
• Limitations
• Alternative strategies
• Future work
Client ClientClient
Solr
Zookeeper
Datacenter
CDCR Anti-patterns - Remote Solr instances
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
DC 3
Why not a single Solr Cloud?
• Same update is transferred to each replica
• Synchronous indexing means burst-indexing is constrained by cross
DC bandwidth
• Increased latency for indexing operations
• Need a ZooKeeper node in a 3rd DC to break ties
• Search requests are not DC-aware, may choose a remote replica
Cross Datacenter Replication in Solr
• Let’s call it CDCR for short
• Accommodate two or more data centres
• Active/passive setup for disaster recovery
• Support limited bandwidth links
• Eventually consistent passive cluster
Source: http://yonik.com/solr-cross-data-center-replication/
CDCR in Solr 6
• Scalable: no SPoF and/or bottleneck
• Peer cluster can have a different replication factor
• Asynchronous updates; no penalty for indexing
• Push operations for low latency replication
• Low overhead — uses existing transaction logs and indexes
• Leader-to-leader communication ensures update is sent only once
to peer cluster
Target Cluster
Tune replication
Synchronize logs
CdcrUpdateLog
Enable APIs
Update chains
Update chains
Update log
CDCR APIs
• http://host:port/solr/collection_name/cdcr?action=START
• Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS
• Monitoring APIs: QUEUES, OPS, ERRORS
How to failover?
• Change configuration on target to make it the source
• Point indexers to the new target
• Change configuration on source to make it the new target
• May require stopping indexing during the conversion process —
especially if you want to revert the change
CDCR support in Solr 6+
• Active/passive setup either for disaster recovery or for low latency
querying
• Solr clusters with existing data can be converted to a source cluster
from Solr 6.2 onwards
• Low to medium indexing traffic
CDCR Limitations and gotchas
• By default CDCR is disabled — invoke START to enable on both
source and target
• Soft commits are not replicated to target — must schedule
autoSoftCommit explicitly on target
• Different set of configurations required on source and target
• Daisy-chaining is possible but not well tested — add all targets to
the same source cluster
CDCR Limitations and gotchas
• Not suitable for applications requiring high throughput indexing —
some knobs exist for tuning replication speeds
• Update log buffers can grow indefinitely when target clusters are
down — can work around by disabling buffering for the time being
if there is only one target
• No automatic failover between source and target — explicit actions
required to modify configurations and point indexing pipelines to
the new source
• No Active/active setup
Alternative strategy
• Use a proper queue such as Apache Kafka to feed source and target DCs
simultaneously
• Use external versions in conjunction with versions generated by Solr —
DocBasedVersionConstraintsProcessorFactory
• Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale”
by Oliver Bates, Apple Inc. — http://sched.co/8ArU
• Pros: Supports high indexing throughputs and active/active replication
• Cons: Additional systems required, managing consistency is difficult and requires in
depth Solr expertise, all atomic updates must go to a single DC, cannot support
delete-by-query
Problems we solved
• Synchronous indexing to replicas — build separate asynchronous
indexing pipeline
• Limited size of the update log — use update log as the queue
• How to track replication progress to preserve consistency on target
clusters in case the source leader dies — checkpoints
• Bootstrapping target cluster with indexes when update logs are
incomplete
• New replicas on source have no logs to replicate — replicate
update logs during recovery
Future work
• Move configuration out of solrconfig.xml and into API calls
• Dynamically add/remove/change target cluster information
• Cap update log to a max size and fall back to index replication if
necessary
• Refactor and combine CdcrUpdateLog
• Better monitoring: capture transfer rate and latency info
• Add support for rate limiting replication between source and target
• Active/active?
Resources
• CDCR page on ref guide — https://cwiki.apache.org/confluence/
pages/viewpage.action?pageId=62687462
• http://yonik.com/solr-cross-data-center-replication/
• https://cwiki.apache.org/confluence/display/solr/
Updating+Parts+of+Documents
Thank you!
shalin@apache.org

More Related Content

What's hot

SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
Mark Miller
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
 

What's hot (20)

Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environment
 
Search-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, EtsySearch-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, Etsy
 

Similar to Cross Datacenter Replication in Apache Solr 6

Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
Lucidworks
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Lucidworks
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
James Chen
 

Similar to Cross Datacenter Replication in Apache Solr 6 (20)

Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talk
 
Failover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptxFailover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptx
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
 
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
Autoscaling Solr - Shalin Shekhar Mangar, LucidworksAutoscaling Solr - Shalin Shekhar Mangar, Lucidworks
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
 

More from Shalin Shekhar Mangar

More from Shalin Shekhar Mangar (7)

Solr BoF (Birds of a Feather) session at Fifth Elephant 2018
Solr BoF (Birds of a Feather) session at Fifth Elephant 2018Solr BoF (Birds of a Feather) session at Fifth Elephant 2018
Solr BoF (Birds of a Feather) session at Fifth Elephant 2018
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
SolrCloud and Shard Splitting
SolrCloud and Shard SplittingSolrCloud and Shard Splitting
SolrCloud and Shard Splitting
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
 

Recently uploaded

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 

Cross Datacenter Replication in Apache Solr 6

  • 1.
  • 2. Cross Datacenter Replication in Apache Solr 6 Shalin Shekhar Mangar Lucidworks Inc. @shalinmangar
  • 3. The standard for enterprise search. of Fortune 500 uses Solr. 90%
  • 4. Agenda • Review a typical Solr deployment architecture • Challenges of running a Solr deployment across data centers • Cross Data Centre Replication (CDCR) in Solr • Setup and configuration • Limitations • Alternative strategies • Future work
  • 6. CDCR Anti-patterns - Remote Solr instances C Solr Zookeeper DC 1 C C DC 2 C C C
  • 7. CDCR Anti-patterns - Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C
  • 8. CDCR Anti-patterns - Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C DC 3
  • 9. Why not a single Solr Cloud? • Same update is transferred to each replica • Synchronous indexing means burst-indexing is constrained by cross DC bandwidth • Increased latency for indexing operations • Need a ZooKeeper node in a 3rd DC to break ties • Search requests are not DC-aware, may choose a remote replica
  • 10. Cross Datacenter Replication in Solr • Let’s call it CDCR for short • Accommodate two or more data centres • Active/passive setup for disaster recovery • Support limited bandwidth links • Eventually consistent passive cluster
  • 12. CDCR in Solr 6 • Scalable: no SPoF and/or bottleneck • Peer cluster can have a different replication factor • Asynchronous updates; no penalty for indexing • Push operations for low latency replication • Low overhead — uses existing transaction logs and indexes • Leader-to-leader communication ensures update is sent only once to peer cluster
  • 14. Enable APIs Update chains Update chains Update log
  • 15. CDCR APIs • http://host:port/solr/collection_name/cdcr?action=START • Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS • Monitoring APIs: QUEUES, OPS, ERRORS
  • 16. How to failover? • Change configuration on target to make it the source • Point indexers to the new target • Change configuration on source to make it the new target • May require stopping indexing during the conversion process — especially if you want to revert the change
  • 17. CDCR support in Solr 6+ • Active/passive setup either for disaster recovery or for low latency querying • Solr clusters with existing data can be converted to a source cluster from Solr 6.2 onwards • Low to medium indexing traffic
  • 18. CDCR Limitations and gotchas • By default CDCR is disabled — invoke START to enable on both source and target • Soft commits are not replicated to target — must schedule autoSoftCommit explicitly on target • Different set of configurations required on source and target • Daisy-chaining is possible but not well tested — add all targets to the same source cluster
  • 19. CDCR Limitations and gotchas • Not suitable for applications requiring high throughput indexing — some knobs exist for tuning replication speeds • Update log buffers can grow indefinitely when target clusters are down — can work around by disabling buffering for the time being if there is only one target • No automatic failover between source and target — explicit actions required to modify configurations and point indexing pipelines to the new source • No Active/active setup
  • 20. Alternative strategy • Use a proper queue such as Apache Kafka to feed source and target DCs simultaneously • Use external versions in conjunction with versions generated by Solr — DocBasedVersionConstraintsProcessorFactory • Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale” by Oliver Bates, Apple Inc. — http://sched.co/8ArU • Pros: Supports high indexing throughputs and active/active replication • Cons: Additional systems required, managing consistency is difficult and requires in depth Solr expertise, all atomic updates must go to a single DC, cannot support delete-by-query
  • 21. Problems we solved • Synchronous indexing to replicas — build separate asynchronous indexing pipeline • Limited size of the update log — use update log as the queue • How to track replication progress to preserve consistency on target clusters in case the source leader dies — checkpoints • Bootstrapping target cluster with indexes when update logs are incomplete • New replicas on source have no logs to replicate — replicate update logs during recovery
  • 22. Future work • Move configuration out of solrconfig.xml and into API calls • Dynamically add/remove/change target cluster information • Cap update log to a max size and fall back to index replication if necessary • Refactor and combine CdcrUpdateLog • Better monitoring: capture transfer rate and latency info • Add support for rate limiting replication between source and target • Active/active?
  • 23. Resources • CDCR page on ref guide — https://cwiki.apache.org/confluence/ pages/viewpage.action?pageId=62687462 • http://yonik.com/solr-cross-data-center-replication/ • https://cwiki.apache.org/confluence/display/solr/ Updating+Parts+of+Documents