SlideShare a Scribd company logo
1 of 29
Scaling SolrCloud to a Large Number
of Collections
Anshum Gupta
Lucidworks
• Anshum Gupta, Apache Lucene/Solr PMC member
and committer, Lucidworks Employee.
• Interested in search and related stuff.
• Apache Lucene since 2006 and Solr since 2010.
• Organizations I am or have been a part of:
Who am I?
Apache Solr is the most widely-used search
solution on the planet.
Solr has tens of thousands of
applications in production.
You use everyday.
8,000,000+
Total downloads
Solr is both established
and growing.
250,000+
Monthly downloads
2,500+
Open Solr jobs and the largest
community of developers.
Solr Scalability is unmatched
The traditional search use-case
• One large index distributed across multiple nodes
• A large number of users searching on the same data
• Searches happen across the entire cluster
— Arthur C. Clarke
“The limits of the possible can only be defined by
going beyond them into the impossible.”
• Analyze and find missing features
• Setup a performance testing environment on AWS
• Devise tests for stability and performance
• Find bugs and bottlenecks and fixes
Analyze, measure, and optimize
• The SolrCloud cluster state has information about all collections,
their shards and replicas
• All nodes and (Java) clients watch the cluster state
• Every state change is notified to all nodes
• Limited to (slightly less than) 1MB by default
• 1 node restart triggers a few 100 watcher fires and pulls from ZK
for a 100 node cluster (three states: down, recovering and active)
Problem #1: Cluster state and updates
• Each collection gets it’s own state node in ZK
• Nodes selectively watch only those states which they
are a member of
• Clients cache state and use smart cache updates
instead of watching nodes
• http://issues.apache.org/jira/browse/SOLR-5473
Solution: Split cluster state and scale
• Thousands of collections create a lot of state updates
• Overseer falls behind and replicas can’t recover or
can’t elect a leader
• Under high indexing/search load, GC pauses can
cause overseer queue to back up
Problem #2: Overseer Performance
• Optimize polling for new items in overseer queue -
Don’t wait to poll! (SOLR-5436)
• Dedicated overseers nodes (SOLR-5476)
• New Overseer Status API (SOLR-5749)
• Asynchronous and multi-threaded execution of
collection commands (SOLR-5477, SOLR-5681)
Solution - Improve the Overseer
• Not all users are born equal - A tenant may have a few very
large users
• We wanted to be able to scale an individual user’s data —
maybe even as it’s own collection
• SolrCloud could split shards with no downtime but it only splits
in half
• No way to ‘extract’ user’s data to another collection or shard
Problem #3: Moving data around
• Shard can be split on arbitrary hash ranges (SOLR-5300)
• Shard can be split by a given key (SOLR-5338, SOLR-5353)
• A new ‘migrate’ API to move a user’s data to another (new)
collection without downtime (SOLR-5308)
Solution: Improved data management
• Lucene/Solr is designed for finding top-N search results
• Trying to export full result set brings down the system due
to high memory requirements as you go deeper
Problem #4: Exporting data
Solution: Distributed deep paging
• Performance goals: 6 billion documents, 4000 queries/sec, 400
updates/sec, 2 seconds NRT sustained performance
• 5% large collections (50 shards), 15% medium (10 shards), 85%
small (1 shard) with replication factor of 3
• Target hardware: 24 CPUs, 126G RAM, 7 SSDs (460G) + 1 HDD
(200G)
• 80% traffic served by 20% of the tenants
Testing scale at scale
Test Infrastructure
Logging
• Tim Potter wrote the Solr Scale Toolkit
• Fabric based tool to setup and manage SolrCloud
clusters in AWS bundled with collectd and SiLK
• Backup/Restore from S3. Parallel clone commands.
• Open source!
• https://github.com/LucidWorks/solr-scale-tk
How to manage large clusters?
• Lucidworks SiLK (Solr + Logstash + Kibana)
• collectd daemons on each host
• rabbitmq to queue messages before delivering to log stash
• Initially started with Kafka but discarded thinking it is overkill
• Not happy with rabbitmq — crashes/unstable
• Might try Kafka again soon
• http://www.lucidworks.com/lucidworks-silk
Gathering metrics and analyzing logs
• Custom randomized data generator (re-producible
using a seed)
• JMeter for generating load
• Embedded CloudSolrServer using JMeter Java
Action Sampler
• JMeter distributed mode was itself a bottleneck!
• Solr scale toolkit has some data generation code
Generating data and load
• 30 hosts, 120 nodes, 1000 collections, 6B+ docs,
15000 queries/second, 2000 writes/second, 2 second
NRT sustained over 24-hours
• More than 3x the numbers we needed
• Unfortunately, we had to stop testing at that point :(
• Our biggest cluster cost us just $120/hour :)
Numbers
• Jepsen tests
• Improvement in test coverage
After those tests
• We continue to test performance
at scale
• Published indexing performance
benchmark, working on others
• 15 nodes, 30 shards, 1 replica,
157195 docs/sec
• 15 nodes, 30 shards, 2
replicas, 61062 docs/sec
And it still goes on…
• Setting up an internal
performance testing environment
• Jenkins CI
• Single node benchmarks
• Cloud tests
• Stay tuned!
Pushing the limits
• SolrCloud continues to be improved
• SOLR-6816 - Review SolrCloud Indexing Performance.
• SOLR-6220 - Replica placement strategy
• SOLR-6273 - Cross data center replication
• SOLR-5750 - Backup/Restore API for SolrCloud
• SOLR-7230 - An API to plugin security into Solr
• Many, many more
Not over yet
Connect @
http://www.twitter.com/anshumgupta
http://www.linkedin.com/in/anshumgupta/
anshum@apache.org

More Related Content

What's hot

Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsAnshum Gupta
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to YouAmazon Web Services
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10Anshum Gupta
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupShalin Shekhar Mangar
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworksAnshum Gupta
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)searchbox-com
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterlucenerevolution
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environmentlucenerevolution
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Anshum Gupta
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 

What's hot (20)

Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIs
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environment
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 

Viewers also liked

Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrAnshum Gupta
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceLucidworks
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and RecommendersLucidworks
 
Solr Performance Monitoring with SPM
Solr Performance Monitoring with SPMSolr Performance Monitoring with SPM
Solr Performance Monitoring with SPMSematext Group, Inc.
 
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkWebinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkLucidworks
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Lucidworks
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingLucidworks
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Lucidworks
 
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...Lucidworks
 
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Lucidworks
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrAnshum Gupta
 
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxLucidworks
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexingUmesh Prasad
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Lucidworks
 
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...Lucidworks
 

Viewers also liked (19)

Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business Intelligence
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
 
Solr Performance Monitoring with SPM
Solr Performance Monitoring with SPMSolr Performance Monitoring with SPM
Solr Performance Monitoring with SPM
 
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkWebinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
 
it's just search
it's just searchit's just search
it's just search
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
 
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexing
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
 
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
 

Similar to Scaling SolrCloud to a large number of Collections

Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerationsSolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerationsSameer Maggon
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?SearchStax
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMLucidworks
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for DrupalChris Caple
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 

Similar to Scaling SolrCloud to a large number of Collections (20)

Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerationsSolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Solr 4
Solr 4Solr 4
Solr 4
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
Big Search with Big Data Principles
Big Search with Big Data PrinciplesBig Search with Big Data Principles
Big Search with Big Data Principles
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Solr
SolrSolr
Solr
 
Solr
SolrSolr
Solr
 

Recently uploaded

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 

Recently uploaded (20)

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 

Scaling SolrCloud to a large number of Collections

  • 1.
  • 2. Scaling SolrCloud to a Large Number of Collections Anshum Gupta Lucidworks
  • 3. • Anshum Gupta, Apache Lucene/Solr PMC member and committer, Lucidworks Employee. • Interested in search and related stuff. • Apache Lucene since 2006 and Solr since 2010. • Organizations I am or have been a part of: Who am I?
  • 4. Apache Solr is the most widely-used search solution on the planet. Solr has tens of thousands of applications in production. You use everyday. 8,000,000+ Total downloads Solr is both established and growing. 250,000+ Monthly downloads 2,500+ Open Solr jobs and the largest community of developers.
  • 6. The traditional search use-case • One large index distributed across multiple nodes • A large number of users searching on the same data • Searches happen across the entire cluster
  • 7. — Arthur C. Clarke “The limits of the possible can only be defined by going beyond them into the impossible.”
  • 8. • Analyze and find missing features • Setup a performance testing environment on AWS • Devise tests for stability and performance • Find bugs and bottlenecks and fixes Analyze, measure, and optimize
  • 9. • The SolrCloud cluster state has information about all collections, their shards and replicas • All nodes and (Java) clients watch the cluster state • Every state change is notified to all nodes • Limited to (slightly less than) 1MB by default • 1 node restart triggers a few 100 watcher fires and pulls from ZK for a 100 node cluster (three states: down, recovering and active) Problem #1: Cluster state and updates
  • 10. • Each collection gets it’s own state node in ZK • Nodes selectively watch only those states which they are a member of • Clients cache state and use smart cache updates instead of watching nodes • http://issues.apache.org/jira/browse/SOLR-5473 Solution: Split cluster state and scale
  • 11. • Thousands of collections create a lot of state updates • Overseer falls behind and replicas can’t recover or can’t elect a leader • Under high indexing/search load, GC pauses can cause overseer queue to back up Problem #2: Overseer Performance
  • 12. • Optimize polling for new items in overseer queue - Don’t wait to poll! (SOLR-5436) • Dedicated overseers nodes (SOLR-5476) • New Overseer Status API (SOLR-5749) • Asynchronous and multi-threaded execution of collection commands (SOLR-5477, SOLR-5681) Solution - Improve the Overseer
  • 13. • Not all users are born equal - A tenant may have a few very large users • We wanted to be able to scale an individual user’s data — maybe even as it’s own collection • SolrCloud could split shards with no downtime but it only splits in half • No way to ‘extract’ user’s data to another collection or shard Problem #3: Moving data around
  • 14. • Shard can be split on arbitrary hash ranges (SOLR-5300) • Shard can be split by a given key (SOLR-5338, SOLR-5353) • A new ‘migrate’ API to move a user’s data to another (new) collection without downtime (SOLR-5308) Solution: Improved data management
  • 15. • Lucene/Solr is designed for finding top-N search results • Trying to export full result set brings down the system due to high memory requirements as you go deeper Problem #4: Exporting data
  • 17. • Performance goals: 6 billion documents, 4000 queries/sec, 400 updates/sec, 2 seconds NRT sustained performance • 5% large collections (50 shards), 15% medium (10 shards), 85% small (1 shard) with replication factor of 3 • Target hardware: 24 CPUs, 126G RAM, 7 SSDs (460G) + 1 HDD (200G) • 80% traffic served by 20% of the tenants Testing scale at scale
  • 20.
  • 21. • Tim Potter wrote the Solr Scale Toolkit • Fabric based tool to setup and manage SolrCloud clusters in AWS bundled with collectd and SiLK • Backup/Restore from S3. Parallel clone commands. • Open source! • https://github.com/LucidWorks/solr-scale-tk How to manage large clusters?
  • 22. • Lucidworks SiLK (Solr + Logstash + Kibana) • collectd daemons on each host • rabbitmq to queue messages before delivering to log stash • Initially started with Kafka but discarded thinking it is overkill • Not happy with rabbitmq — crashes/unstable • Might try Kafka again soon • http://www.lucidworks.com/lucidworks-silk Gathering metrics and analyzing logs
  • 23. • Custom randomized data generator (re-producible using a seed) • JMeter for generating load • Embedded CloudSolrServer using JMeter Java Action Sampler • JMeter distributed mode was itself a bottleneck! • Solr scale toolkit has some data generation code Generating data and load
  • 24. • 30 hosts, 120 nodes, 1000 collections, 6B+ docs, 15000 queries/second, 2000 writes/second, 2 second NRT sustained over 24-hours • More than 3x the numbers we needed • Unfortunately, we had to stop testing at that point :( • Our biggest cluster cost us just $120/hour :) Numbers
  • 25. • Jepsen tests • Improvement in test coverage After those tests
  • 26. • We continue to test performance at scale • Published indexing performance benchmark, working on others • 15 nodes, 30 shards, 1 replica, 157195 docs/sec • 15 nodes, 30 shards, 2 replicas, 61062 docs/sec And it still goes on… • Setting up an internal performance testing environment • Jenkins CI • Single node benchmarks • Cloud tests • Stay tuned!
  • 28. • SolrCloud continues to be improved • SOLR-6816 - Review SolrCloud Indexing Performance. • SOLR-6220 - Replica placement strategy • SOLR-6273 - Cross data center replication • SOLR-5750 - Backup/Restore API for SolrCloud • SOLR-7230 - An API to plugin security into Solr • Many, many more Not over yet

Editor's Notes

  1. Other use cases, different from the general one. Large setup
  2. Our plan
  3. Didn’t use Zabbix as JMX wasn’t being really useful for us. RabbitMQ instead of Kafka
  4. collectd daemon on each of the hosts
  5. i2.4xlarge machines
  6. 10 x r3.2xlarge nodes, each running 1 instance of Solr 4.8.1 vs 5 35k vs 75k docs/s (130 mn Docs)