Scaling SolrCloud to a Large 
Number of Collections 
Shalin Shekhar Mangar 
Lucidworks Inc.
Apache Solr has tremendous momentum 
Solr is both established & growing 
8M+ total 
downloads 
250,000+ 
monthly downloads 
Largest community of developers. 
2500+ 
open Solr jobs. 
The most widely used search 
solution on the planet. 
Lucene/Solr Revolution 
world’s largest open source 
user conference dedicated 
to Lucene/Solr. 
You use 
Solr everyday. 
Solr has tens of thousands 
of applications in production.
Solr Scalability is unmatched
The traditional search use-case 
• One large index distributed across multiple nodes 
• A large number of users searching on the same data 
• Searches happen across the entire cluster 
Example: eCommerce Product Catalogue
“The limits of the possible can only be defined by 
going beyond them into the impossible.” 
—Arthur C. Clarke
Analyze, measure and optimize 
•Analyze and find missing features 
•Setup a performance testing environment on AWS 
•Devise tests for stability and performance 
•Find bugs and bottlenecks and fixes
Problem #1: Cluster state and updates 
•The SolrCloud cluster state has information about all 
collections, their shards and replicas 
•All nodes and (Java) clients watch the cluster state 
•Every state change is notified to all nodes 
•Limited to (slightly less than) 1MB by default 
•1 node restart triggers a few 100 watcher fires and pulls 
from ZK for a 100 node cluster (three states: down, 
recovering and active)
Solution: Split cluster state and scale 
• Each collection gets it’s own state node in ZK 
• Nodes selectively watch only those states 
which they are a member of 
• Clients cache state and use smart cache 
updates instead of watching nodes 
• http://issues.apache.org/jira/browse/ 
SOLR-5473
Problem #2: Overseer Performance 
• Thousands of collections create a lot of state 
updates 
• Overseer falls behind and replicas can’t 
recover or can’t elect a leader 
• Under high indexing/search load, GC pauses 
can cause overseer queue to back up
Solution - Improve the overseer 
• Optimize polling for new items in overseer 
queue (SOLR-5436) 
• Dedicated overseers nodes (SOLR-5476) 
• New Overseer Status API (SOLR-5749) 
• Asynchronous execution of collection 
commands (SOLR-5477, SOLR-5681)
Problem #3: Moving data around 
• Not all users are born equal - A tenant may have a 
few very large users 
• We wanted to be able to scale an individual user’s 
data — maybe even as it’s own collection 
• SolrCloud can split shards with no downtime but it 
only splits in half 
• No way to ‘extract’ user’s data to another collection 
or shard
Solution: Improved data management 
• Shard can be split on arbitrary hash ranges 
(SOLR-5300) 
• Shard can be split by a given key (SOLR-5338, 
SOLR-5353) 
• A new ‘migrate’ API to move a user’s data to 
another (new) collection without downtime 
(SOLR-5308)
Problem #4: Exporting data 
• Lucene/Solr is designed for finding top-N 
search results 
• Trying to export full result set brings down the 
system due to high memory requirements as 
you go deeper
Solution: Distributed deep paging
Testing scale at scale 
• Performance goals: 6 billion documents, 4000 
queries/sec, 400 updates/sec, 2 seconds NRT 
sustained performance 
• 5% large collections (50 shards), 15% medium (10 
shards), 85% small (1 shard) with replication factor of 3 
• Target hardware: 24 CPUs, 126G RAM, 7 SSDs (460G) 
+ 1 HDD (200G) 
• 80% traffic served by 20% of the tenants
Test Infrastructure
Logging
How to manage large clusters? 
• Tim Potter wrote the Solr Scale Toolkit 
• Fabric based tool to setup and manage 
SolrCloud clusters in AWS complete with 
collectd and SiLK 
• Backup/Restore from S3. Parallel clone 
commands. 
• Open source! 
• https://github.com/LucidWorks/solr-scale-tk
Gathering metrics and analyzing logs 
• Lucidworks SiLK (Solr + Logstash + Kibana) 
• collectd daemons on each host 
• rabbitmq to queue messages before delivering to log stash 
• Initially started with Kafka but discarded thinking it is overkill 
• Not happy with rabbitmq — crashes/unstable 
• Might try Kafka again soon 
• http://www.lucidworks.com/lucidworks-silk
Generating data and load 
• Custom randomized data generator (re-producible 
using a seed) 
• JMeter for generating load 
• Embedded CloudSolrServer (Solr Java client) 
using JMeter Java Action Sampler 
• JMeter distributed mode was itself a bottleneck! 
• Not open source (yet) but we’re working on it!
Numbers 
•30 hosts, 120 nodes, 1000 collections, 6B+ docs, 
15000 queries/second, 2000 writes/second, 2 
second NRT sustained over 24-hours 
•More than 3x the numbers we needed 
•Unfortunately, we had to stop testing at that 
point :( 
•Our biggest cluster cost us just $120/hour :)
Not over yet 
• We continue to test performance at scale 
• Published indexing performance benchmark, working on others 
• 15 nodes, 30 shards, 1 replica, 157195 docs/sec 
• 15 nodes, 30 shards, 2 replicas, 61062 docs/sec 
• http://searchhub.org/introducing-the-solr-scale-toolkit/ 
• Setting up an internal performance testing environment 
• Jenkins CI 
• Jepsen tests 
• Single node benchmarks 
• Cloud tests 
• Stay tuned!
Pushing the limits
Not over yet 
• SolrCloud continues to be improved 
• SOLR-6220 - Replica placement strategy 
• SOLR-6273 - Cross data center replication 
• SOLR-5656 - Auto-add replicas on HDFS 
• SOLR-5986 - Don’t allow runaway queries to harm the cluster 
• SOLR-5750 - Backup/Restore API for SolrCloud 
• Many, many more
Thank you! 
shalin@apache.org 
@shalinmangar

Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekhar Mangar, Lucidworks

  • 2.
    Scaling SolrCloud toa Large Number of Collections Shalin Shekhar Mangar Lucidworks Inc.
  • 3.
    Apache Solr hastremendous momentum Solr is both established & growing 8M+ total downloads 250,000+ monthly downloads Largest community of developers. 2500+ open Solr jobs. The most widely used search solution on the planet. Lucene/Solr Revolution world’s largest open source user conference dedicated to Lucene/Solr. You use Solr everyday. Solr has tens of thousands of applications in production.
  • 4.
  • 5.
    The traditional searchuse-case • One large index distributed across multiple nodes • A large number of users searching on the same data • Searches happen across the entire cluster Example: eCommerce Product Catalogue
  • 6.
    “The limits ofthe possible can only be defined by going beyond them into the impossible.” —Arthur C. Clarke
  • 7.
    Analyze, measure andoptimize •Analyze and find missing features •Setup a performance testing environment on AWS •Devise tests for stability and performance •Find bugs and bottlenecks and fixes
  • 8.
    Problem #1: Clusterstate and updates •The SolrCloud cluster state has information about all collections, their shards and replicas •All nodes and (Java) clients watch the cluster state •Every state change is notified to all nodes •Limited to (slightly less than) 1MB by default •1 node restart triggers a few 100 watcher fires and pulls from ZK for a 100 node cluster (three states: down, recovering and active)
  • 9.
    Solution: Split clusterstate and scale • Each collection gets it’s own state node in ZK • Nodes selectively watch only those states which they are a member of • Clients cache state and use smart cache updates instead of watching nodes • http://issues.apache.org/jira/browse/ SOLR-5473
  • 10.
    Problem #2: OverseerPerformance • Thousands of collections create a lot of state updates • Overseer falls behind and replicas can’t recover or can’t elect a leader • Under high indexing/search load, GC pauses can cause overseer queue to back up
  • 11.
    Solution - Improvethe overseer • Optimize polling for new items in overseer queue (SOLR-5436) • Dedicated overseers nodes (SOLR-5476) • New Overseer Status API (SOLR-5749) • Asynchronous execution of collection commands (SOLR-5477, SOLR-5681)
  • 12.
    Problem #3: Movingdata around • Not all users are born equal - A tenant may have a few very large users • We wanted to be able to scale an individual user’s data — maybe even as it’s own collection • SolrCloud can split shards with no downtime but it only splits in half • No way to ‘extract’ user’s data to another collection or shard
  • 13.
    Solution: Improved datamanagement • Shard can be split on arbitrary hash ranges (SOLR-5300) • Shard can be split by a given key (SOLR-5338, SOLR-5353) • A new ‘migrate’ API to move a user’s data to another (new) collection without downtime (SOLR-5308)
  • 14.
    Problem #4: Exportingdata • Lucene/Solr is designed for finding top-N search results • Trying to export full result set brings down the system due to high memory requirements as you go deeper
  • 15.
  • 16.
    Testing scale atscale • Performance goals: 6 billion documents, 4000 queries/sec, 400 updates/sec, 2 seconds NRT sustained performance • 5% large collections (50 shards), 15% medium (10 shards), 85% small (1 shard) with replication factor of 3 • Target hardware: 24 CPUs, 126G RAM, 7 SSDs (460G) + 1 HDD (200G) • 80% traffic served by 20% of the tenants
  • 17.
  • 18.
  • 20.
    How to managelarge clusters? • Tim Potter wrote the Solr Scale Toolkit • Fabric based tool to setup and manage SolrCloud clusters in AWS complete with collectd and SiLK • Backup/Restore from S3. Parallel clone commands. • Open source! • https://github.com/LucidWorks/solr-scale-tk
  • 21.
    Gathering metrics andanalyzing logs • Lucidworks SiLK (Solr + Logstash + Kibana) • collectd daemons on each host • rabbitmq to queue messages before delivering to log stash • Initially started with Kafka but discarded thinking it is overkill • Not happy with rabbitmq — crashes/unstable • Might try Kafka again soon • http://www.lucidworks.com/lucidworks-silk
  • 22.
    Generating data andload • Custom randomized data generator (re-producible using a seed) • JMeter for generating load • Embedded CloudSolrServer (Solr Java client) using JMeter Java Action Sampler • JMeter distributed mode was itself a bottleneck! • Not open source (yet) but we’re working on it!
  • 23.
    Numbers •30 hosts,120 nodes, 1000 collections, 6B+ docs, 15000 queries/second, 2000 writes/second, 2 second NRT sustained over 24-hours •More than 3x the numbers we needed •Unfortunately, we had to stop testing at that point :( •Our biggest cluster cost us just $120/hour :)
  • 24.
    Not over yet • We continue to test performance at scale • Published indexing performance benchmark, working on others • 15 nodes, 30 shards, 1 replica, 157195 docs/sec • 15 nodes, 30 shards, 2 replicas, 61062 docs/sec • http://searchhub.org/introducing-the-solr-scale-toolkit/ • Setting up an internal performance testing environment • Jenkins CI • Jepsen tests • Single node benchmarks • Cloud tests • Stay tuned!
  • 25.
  • 26.
    Not over yet • SolrCloud continues to be improved • SOLR-6220 - Replica placement strategy • SOLR-6273 - Cross data center replication • SOLR-5656 - Auto-add replicas on HDFS • SOLR-5986 - Don’t allow runaway queries to harm the cluster • SOLR-5750 - Backup/Restore API for SolrCloud • Many, many more
  • 27.