Deploying and managing Solr at Scale
Who am I?
• Anshum Gupta, Apache Lucene/Solr committer,
Lucidworks Employee.
• Interested in search and related stuff.
• Apache Lucene since 2006 and Solr since 2010.
• Organizations I am or have been a part of:
Apache Solr has a huge install base and tremendous momentum
most widely used search
solution on the planet.
8M+
total downloads
Solr is both established & growing
250,000+
monthly downloads
Solr has tens of thousands
of applications in production.
You use Solr everyday.
2500+open Solr jobs.
Activity Summary
30 Day summary
Dec 06, 2014 - Jan 05, 2015
• 135 Commits
• 17 Contributors
via https://www.openhub.net/p/solr
12 Month Summary
Jan 5, 2014 — Jan 5, 2015
• 1363 Commits
• 30 Contributors
Getting started with Solr
• Download
• Untar/Unzip
• bin/solr start -e cloud -noprompt
• open http://localhost:8983/solr
Recent usability improvements
• Start scripts
• Schema APIs
• Config API - Register custom handlers using API
• Status APIs and more….
SolrCloud Architecture
Shard 1
(leader)
Followers
Shard 2
(leader)
Followers
ZooKeeper
Ensemble
Multiple Nodes = Need for Coordination
Production scale?
• Zk ensemble. NOT embedded
• Multiple nodes
• Manually (or script) the 4 steps for each node?
Solr Scale Toolkit
• Open Source!
• Fabric (Python) toolset for deploying and managing SolrCloud
clusters in the cloud
• Code to support benchmark tests (Pig script for data generation /
indexing, JMeter samplers)
• EC2 for now, more cloud providers coming soon via Apache
libcloud
• No *need* to know Python!
The building blocks: A lot of python!
• boto – Python API for AWS (EC2, S3, etc)
• Fabric – Python-based tool for automating system admin tasks over SSH
• pysolr – Python library for Solr (sending commits, queries, ...)
• kazoo – Python client tools for ZooKeeper
• Supporting Cast:
• JMeter – run tests, generate reports
• collectd – system monitoring
• Logstash4Solr – log aggregation
• JConsole/VisualVM – monitor JVM during indexing / queries
Overview of features:
• Provisioning N machine instances in EC2
• Configuring / starting ZooKeeper (1 to n servers)
• Configuring / starting N Solr instances in cloud
mode (M x N nodes)
• Integrating with Logstash4Solr and other
supporting services, e.g. collectd
• Day-to-day operations on an existing cluster
N X M SolrCloud Nodes
ZK Host N
Node 1: Custom AMI
Architecture
Solr-Scale-Toolkit
SiLK
ZK Host 1
ZooKeeper 1
ZK Ensemble
Meta Node
Solr Node 1: 8983
core
core
core
Solr Node N: 89xx
core
core
core
ZooKeeper N
X M such machines
system monitoring
of M machines w/
collectd and JMX
Provisioning cluster nodes
• Custom built AMI (one for PV instances and one for HVM instances) – Amazon Linux
• Dedicated disk per Solr node
• Launch and then poll status until they are live
• Verify SSH connectivity
• Tag each instance with a cluster ID and username
fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge
Deploy ZooKeeper ensemble
• Two options to use the ensemble:
• Provision 1 to N nodes when you launch Solr cluster
• use existing named ensemble
• Fabric command simply creates the myid files and zoo.cfg file for the
ensemble
• and some cron scripts for managing snapshots
• Basic health checking of ZooKeeper status:
• echo srvr | nc localhost 2181
fab	
  new_zk_ensemble:zk1,n=3
Deploy SolrCloud cluster
• Uses bin/solr in Solr 4.10 to control Solr nodes
• Set system props: jetty.port, host, zkHost, JVM opts
• One or more Solr nodes per machine
• JVM mem opts dependent on instance type and # of Solr nodes per
instance
• Optionally configure log4j.properties to append messages to Rabbitmq
for SiLK integration
fab	
  new_solrcloud:test1,zk=zk1,nodesPerHost=2
Demo
• Launch ZooKeeper Ensemble
• 3 nodes to establish quorum
• Launch SolrCloud cluster
• Create new collection and index some docs
• Run a healthcheck on the collection
Dashboards
Other useful stuff
• patch from a local build.
• fab mine: See clusters I’m running (or for other users too)
• fab kill_mine: Terminate all instances I’m running
• fab ssh_to: Quick way to SSH to one of the nodes in a cluster
• fab stop/recover/kill: Basic commands for controlling specific
Solr nodes in the cluster
• fab jmeter: Execute a JMeter test plan against your cluster
• Example test plan and Java sampler is included with the source
Testing Methodology
• Transparent repeatable results
• Ideally hoping for something owned by the community
• Synthetic docs ~ 1K each on disk, mix of field types
• Data set created using code borrowed from PigMix
• English text fields generated using a Zipfian distribution
• Java 1.7u67, Amazon Linux, r3.2xlarge nodes
• enhanced networking enabled, placement group, same AZ
• Stock Solr (cloud) 4.10
• Using custom GC tuning parameters and auto-commit settings
• Use Elastic MapReduce to generate indexing load
• As many nodes as I need to drive Solr!
Indexing performance
Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec
10 10 1 48 1762 73,780
10 10 2 34 3727 34,881
10 20 1 48 1282 101,404
10 20 2 34 3207 40,536
10 30 1 72 1070 121,495
10 30 2 60 3159 41,152
15 15 1 60 1106 117,541
15 15 2 42 2465 52,738
15 30 1 60 827 157,195
15 30 2 42 2129 61,062
Indexing performance lessons
• Solr has no built-in throttling support – will accept work until it
falls over; need to build this into your indexing application
logic
• Oversharding helps parallelize indexing work and gives you an
easy way to add more hardware to your cluster
• GC tuning is critical
• Auto-hard commit to keep transaction logs manageable
• Auto soft-commit to see docs as they are indexed
• Replication is expensive! (Work in progress, SOLR-6816)
Query Performance
• Still a work in progress!
• Sustained QPS & Execution time of 99th Percentile
• Stable: ~5,000 QPS / 99th at 300ms while indexing ~10,000 docs / sec
• Using the TermsComponent to build queries based on the terms in each
field.
• Harder to accurately simulate user queries over synthetic data
• Need mix of faceting, paging, sorting, grouping, boolean clauses, range
queries, boosting, filters (some cached, some not), etc ...
• Start with one server (1 shard) to determine baseline query performance.
• Look for inefficiencies in your schema and other config settings
More on query performance…
• Higher risk of full GC pauses (facets, filters, sorting)
• Use optimized data structures (DocValues) for facet / sort fields, Trie-
based numeric fields for range queries, facet.method=enum for low
cardinality fields
• Add more replicas; load-balance
• -Dhttp.maxConnections=## (default = 5, increase to accommodate
more threads sending queries)
• Avoid increasing ZooKeeper client timeout ~ 15000 (15 seconds) is
about right
• Don’t just keep throwing more memory at Java! –Xmx128G
Roadmap
• Not just AWS
• No need for custom AMI, configurable download
paths and versions.
Questions?
References
• Solr scale toolkit
• Blog: http://lucidworks.com/blog/introducing-
the-solr-scale-toolkit/
• Podcast: http://solrcluster.podbean.com/e/tim-
potter-on-the-solr-scale-toolkit/
• github: https://github.com/LucidWorks/solr-
scale-tk
Connect @
http://www.twitter.com/anshumgupta
http://www.linkedin.com/in/anshumgupta/
anshum@apache.org

Deploying and managing Solr at scale

  • 1.
  • 2.
    Who am I? •Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee. • Interested in search and related stuff. • Apache Lucene since 2006 and Solr since 2010. • Organizations I am or have been a part of:
  • 3.
    Apache Solr hasa huge install base and tremendous momentum most widely used search solution on the planet. 8M+ total downloads Solr is both established & growing 250,000+ monthly downloads Solr has tens of thousands of applications in production. You use Solr everyday. 2500+open Solr jobs. Activity Summary 30 Day summary Dec 06, 2014 - Jan 05, 2015 • 135 Commits • 17 Contributors via https://www.openhub.net/p/solr 12 Month Summary Jan 5, 2014 — Jan 5, 2015 • 1363 Commits • 30 Contributors
  • 4.
    Getting started withSolr • Download • Untar/Unzip • bin/solr start -e cloud -noprompt • open http://localhost:8983/solr
  • 5.
    Recent usability improvements •Start scripts • Schema APIs • Config API - Register custom handlers using API • Status APIs and more….
  • 6.
    SolrCloud Architecture Shard 1 (leader) Followers Shard2 (leader) Followers ZooKeeper Ensemble Multiple Nodes = Need for Coordination
  • 7.
    Production scale? • Zkensemble. NOT embedded • Multiple nodes • Manually (or script) the 4 steps for each node?
  • 8.
    Solr Scale Toolkit •Open Source! • Fabric (Python) toolset for deploying and managing SolrCloud clusters in the cloud • Code to support benchmark tests (Pig script for data generation / indexing, JMeter samplers) • EC2 for now, more cloud providers coming soon via Apache libcloud • No *need* to know Python!
  • 9.
    The building blocks:A lot of python! • boto – Python API for AWS (EC2, S3, etc) • Fabric – Python-based tool for automating system admin tasks over SSH • pysolr – Python library for Solr (sending commits, queries, ...) • kazoo – Python client tools for ZooKeeper • Supporting Cast: • JMeter – run tests, generate reports • collectd – system monitoring • Logstash4Solr – log aggregation • JConsole/VisualVM – monitor JVM during indexing / queries
  • 10.
    Overview of features: •Provisioning N machine instances in EC2 • Configuring / starting ZooKeeper (1 to n servers) • Configuring / starting N Solr instances in cloud mode (M x N nodes) • Integrating with Logstash4Solr and other supporting services, e.g. collectd • Day-to-day operations on an existing cluster
  • 11.
    N X MSolrCloud Nodes ZK Host N Node 1: Custom AMI Architecture Solr-Scale-Toolkit SiLK ZK Host 1 ZooKeeper 1 ZK Ensemble Meta Node Solr Node 1: 8983 core core core Solr Node N: 89xx core core core ZooKeeper N X M such machines system monitoring of M machines w/ collectd and JMX
  • 12.
    Provisioning cluster nodes •Custom built AMI (one for PV instances and one for HVM instances) – Amazon Linux • Dedicated disk per Solr node • Launch and then poll status until they are live • Verify SSH connectivity • Tag each instance with a cluster ID and username fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge
  • 13.
    Deploy ZooKeeper ensemble •Two options to use the ensemble: • Provision 1 to N nodes when you launch Solr cluster • use existing named ensemble • Fabric command simply creates the myid files and zoo.cfg file for the ensemble • and some cron scripts for managing snapshots • Basic health checking of ZooKeeper status: • echo srvr | nc localhost 2181 fab  new_zk_ensemble:zk1,n=3
  • 14.
    Deploy SolrCloud cluster •Uses bin/solr in Solr 4.10 to control Solr nodes • Set system props: jetty.port, host, zkHost, JVM opts • One or more Solr nodes per machine • JVM mem opts dependent on instance type and # of Solr nodes per instance • Optionally configure log4j.properties to append messages to Rabbitmq for SiLK integration fab  new_solrcloud:test1,zk=zk1,nodesPerHost=2
  • 15.
    Demo • Launch ZooKeeperEnsemble • 3 nodes to establish quorum • Launch SolrCloud cluster • Create new collection and index some docs • Run a healthcheck on the collection
  • 16.
  • 17.
    Other useful stuff •patch from a local build. • fab mine: See clusters I’m running (or for other users too) • fab kill_mine: Terminate all instances I’m running • fab ssh_to: Quick way to SSH to one of the nodes in a cluster • fab stop/recover/kill: Basic commands for controlling specific Solr nodes in the cluster • fab jmeter: Execute a JMeter test plan against your cluster • Example test plan and Java sampler is included with the source
  • 18.
    Testing Methodology • Transparentrepeatable results • Ideally hoping for something owned by the community • Synthetic docs ~ 1K each on disk, mix of field types • Data set created using code borrowed from PigMix • English text fields generated using a Zipfian distribution • Java 1.7u67, Amazon Linux, r3.2xlarge nodes • enhanced networking enabled, placement group, same AZ • Stock Solr (cloud) 4.10 • Using custom GC tuning parameters and auto-commit settings • Use Elastic MapReduce to generate indexing load • As many nodes as I need to drive Solr!
  • 19.
    Indexing performance Cluster Size# of Shards # of Replicas Reducers Time (secs) Docs / sec 10 10 1 48 1762 73,780 10 10 2 34 3727 34,881 10 20 1 48 1282 101,404 10 20 2 34 3207 40,536 10 30 1 72 1070 121,495 10 30 2 60 3159 41,152 15 15 1 60 1106 117,541 15 15 2 42 2465 52,738 15 30 1 60 827 157,195 15 30 2 42 2129 61,062
  • 20.
    Indexing performance lessons •Solr has no built-in throttling support – will accept work until it falls over; need to build this into your indexing application logic • Oversharding helps parallelize indexing work and gives you an easy way to add more hardware to your cluster • GC tuning is critical • Auto-hard commit to keep transaction logs manageable • Auto soft-commit to see docs as they are indexed • Replication is expensive! (Work in progress, SOLR-6816)
  • 21.
    Query Performance • Stilla work in progress! • Sustained QPS & Execution time of 99th Percentile • Stable: ~5,000 QPS / 99th at 300ms while indexing ~10,000 docs / sec • Using the TermsComponent to build queries based on the terms in each field. • Harder to accurately simulate user queries over synthetic data • Need mix of faceting, paging, sorting, grouping, boolean clauses, range queries, boosting, filters (some cached, some not), etc ... • Start with one server (1 shard) to determine baseline query performance. • Look for inefficiencies in your schema and other config settings
  • 22.
    More on queryperformance… • Higher risk of full GC pauses (facets, filters, sorting) • Use optimized data structures (DocValues) for facet / sort fields, Trie- based numeric fields for range queries, facet.method=enum for low cardinality fields • Add more replicas; load-balance • -Dhttp.maxConnections=## (default = 5, increase to accommodate more threads sending queries) • Avoid increasing ZooKeeper client timeout ~ 15000 (15 seconds) is about right • Don’t just keep throwing more memory at Java! –Xmx128G
  • 23.
    Roadmap • Not justAWS • No need for custom AMI, configurable download paths and versions.
  • 24.
  • 25.
    References • Solr scaletoolkit • Blog: http://lucidworks.com/blog/introducing- the-solr-scale-toolkit/ • Podcast: http://solrcluster.podbean.com/e/tim- potter-on-the-solr-scale-toolkit/ • github: https://github.com/LucidWorks/solr- scale-tk
  • 26.