More Related Content Similar to SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance (20) More from Lucidworks (Archived) (20) SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance1. Search | Discover | Analyze
Confidential and Proprietary © Copyright 2013
Benchmarking Solr
Performance
June 18, 2014
Timothy Potter
2. Confidential and Proprietary © Copyright 2013
My SolrCloud Experience
• At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr
committer
• Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18
shards ~900M docs)
• Built a Fabric/boto framework for deploying and managing a cluster in
EC2
• Co-author of Solr In Action
4. Confidential and Proprietary © Copyright 2013
Cluster sizing
How many servers do I need to index X docs?
... shards ... ?
... replicas ... ?
I need N queries per second over
M docs, how many servers do I need?
It depends?!?
5. Confidential and Proprietary © Copyright 2013
Methodology
• Transparent repeatable results
– Ideally hoping for something owned by the community
• Synthetic docs ~ 1K each on disk, mix of field types
– Data set created using code borrowed from PigMix
– English text fields generated using a Zipfian distribution
• Java 1.7u55, Amazon Linux, r3.2xlarge nodes
– enhanced networking enabled, placement group, same AZ
• Stock Solr (cloud) 4.8.1
– Using Shawn Heisey’s GC tuning parameters
• Use Elastic MapReduce to generate load
– As many nodes as I need to drive Solr!
6. Confidential and Proprietary © Copyright 2013
Indexing Results
Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec
10 10 1 48 1762 73,780
10 10 2 34 3727 34,881
10 20 1 48 1282 101,404
10 20 2 34 3207 40,536
10 30 1 72 1070 121,495
10 30 2 60 3159 41,152
15 15 1 60 1106 117,541
15 15 2 42 2465 52,738
15 30 1 60 827 157,195
15 30 2 42 2129 61,062
7. Confidential and Proprietary © Copyright 2013
Direct Updates
Indexing
Client 1
CloudSolrServer
(SolrJ)
ZooKeeper
/clusterstate.json
Shard 1
(leader)
Shard 2
(leader)
Shard 3
(leader)
<doc>
<doc>
Watch
/clusterstate.json
<doc>
<doc>
compute shard
assignment on
clientbatch
8. Confidential and Proprietary © Copyright 2013
Replication
CloudSolrServer
(SolrJ)
ZooKeeper
/clusterstate.json
Shard 1
(leader)
Shard 2
(leader)
Shard 3
(leader)
<doc>
<doc>
Watch
/clusterstate.json
<doc>
Shard 1
(replica)
Shard 2
(replica)
Shard 3
(replica)
Blocks for response
from replica(s)
10. Confidential and Proprietary © Copyright 2013
Lessons Learned
• Know what throughput your client side is capable of
generating
– If in MapReduce, index from reducers with speculative execution
disabled
• Don’t change Solr config without good reasons for doing
so
• Overshard (but not too much)
• Near-linear scalability as I added nodes!
11. Confidential and Proprietary © Copyright 2013
Query Performance Tests
• All nodes in SolrCloud perform indexing and execute queries
• Using the TermsComponent to build queries based on the terms in
each field.
• Harder to accurately simulate user queries over synthetic data
– Need mix of faceting, paging, sorting, grouping, boolean clauses, range
queries, boosting, filters (some cached, some not), etc ...
• Does the randomness in your test queries model (expected) user
behavior?
• Start with one server (1 shard) to determine baseline query
performance.
– Look for inefficiencies in your schema and other config settings
12. Confidential and Proprietary © Copyright 2013
Solr Scale Toolkit
• Fabric / Python based toolset for deploying and
managing SolrCloud clusters
• SolrJ-based client application useful for building
tools that need access to cluster state information
in ZooKeeper
• Code to support benchmarks for Solr
13. Confidential and Proprietary © Copyright 2013
Python-based Tools
boto – Python API for AWS (EC2, S3, etc)
Fabric – Python-based tool for automating system admin tasks
over SSH
pysolr – Python library for Solr (sending commits, queries, ...)
kazoo – Python client tools for ZooKeeper
Supporting Cast:
JMeter – run tests, generate reports
collectd – system monitoring
Logstash4Solr – log aggregation
JConsole/VisualVM – monitor JVM during indexing / queries
14. Confidential and Proprietary © Copyright 2013
Solr Scale Toolkit: Demo
• Launch a meta node
– Log agg / basic monitoring using SiLK
• Launch ZooKeeper Ensemble
– 3 nodes to establish quorum
– Setup cron job to clean-up snapshots
• Launch SolrCloud cluster
• Create new collection and index some docs
– Attach JConsole while indexing
• Run a healthcheck on the collection
• Checkout Banana Dashboard
• Backup / Restore
– Requires patch for SOLR-5956
– Use fab patch_jars to update jars and do a rolling restart
15. Confidential and Proprietary © Copyright 2013
• Custom built AMI?
• Block device mapping
– dedicated disk per Solr node
• Launch and then poll status until they are live
– verify SSH connectivity
• Tag each instance with a cluster ID and username
Provisioning machines
fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge
16. Confidential and Proprietary © Copyright 2013
• Two options:
– provision 1 to N nodes when you launch Solr cluster
– use existing named ensemble
• Fabric command simply creates the myid
files and zoo.cfg file for the ensemble
– and some cron scripts for managing snapshots
• Basic health checking of ZooKeeper status:
– echo srvr | nc localhost 2181
ZooKeeper
fab new_zk_ensemble:zk1,n=3
17. Confidential and Proprietary © Copyright 2013
• Upload a BASH script that starts/stops Solr
• Set system props: jetty.port, host, zkHost, JVM
opts
• One or more Solr nodes per machine
• JVM mem opts dependent on instance type and
# of Solr nodes per instance
• Optionally configure log4j.properties to append
messages to Rabbitmq for Logstash4Solr
integration
SolrCloud
fab new_solrcloud:test1,zk=zk1,nodesPerHost=2
18. Confidential and Proprietary © Copyright 2013
• BASH script that implements:
– start/stop Solr nodes on each EC2 instance
– sets JVM memory options, system properties
(jetty.port), enable remote JMX, etc
– backup log files before restarting nodes
– ensure JVM is killed correctly before restarting
• Environment variables in:
solr-ctl-env.sh
solr-ctl.sh
19. Confidential and Proprietary © Copyright 2013
• Deploy a configuration directory to ZooKeeper
• Create a new collection
• Attach a local JConsole/VisualVM to a remote JVM
• Rolling restart (with Overseer awareness)
• Build Solr locally and patch remote
– Use a relay server to scp the JARs to Amazon network once and then
scp them to other nodes from within the network
• Put/get files
• Grep over all log files (across the cluster)
Miscellaneous Utility Tasks
20. Confidential and Proprietary © Copyright 2013
• fab mine: See clusters I’m running (or for other users too)
• fab kill_mine: Terminate all instances I’m running
– Use termination protection in production
• fab ssh_to: Quick way to SSH to one of the nodes in a
cluster
• fab stop/recover/kill: Basic commands for controlling
specific Solr nodes in the cluster
• fab jmeter: Execute a JMeter test plan against your cluster
– Example test plan and Java sampler is included with the source
Other useful stuff ...
21. Confidential and Proprietary © Copyright 2013
• Java-based command-line application that uses SolrJ’s
CloudSolrServer to perform advanced cluster
management operations:
– healthcheck: collect metadata and health information from all
replicas for a collection from ZooKeeper
– backup: create a snapshot of each shard in a collection for
backing up to remote storage (S3)
• Framework for building complex tools that benefit from
having access to cluster state information in ZooKeeper
SolrCloud Tools (SolrJ client app)
./tools.sh –tool healthcheck
22. Confidential and Proprietary © Copyright 2013
SiLK Integration
• SiLK: Solr integrated with Logstash and Kibana
– Index time-series data, such as log data (collectd, Solr logs, ...)
– Build cool dashboards with Banana (fork of Kibana)
• Easily aggregate all WARN and more severe log
messages from all Solr servers into logstash4solr
• Send collectd metrics to logstash4solr
24. Confidential and Proprietary © Copyright 2013
What’s Next?
• Migrate to using Apache libcloud instead of using boto
directly
• Benchmark mixed work-loads (queries and indexing)
• SiLK is improving rapidly!
• Chaos monkey tests
– integrate jepsen?
• Open source so please kick the tires!
25. Confidential and Proprietary © Copyright 2013
Wrap-up
• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk
• LucidWorks: http://www.lucidworks.com
• SiLK: http://www.lucidworks.com/lucidworks-silk/
• Solr In Action: http://www.manning.com/grainger/
• Connect: @thelabdude / tim.potter@lucidworks.com
Questions?
Editor's Notes Yes, it does, but we need to start somewhere more shards == better indexing throughput (if your servers can handle it) replication is as fast as your slowest replica
replicas have to re-analyze documents
future – would like to send pre-analyzed docs between leader and replica, esp if text analysis is complex Not much capacity available for running queries on this node First couple of passes, I either had really bad performance (too random) or really fast (not random enough) Make it very easy to launch and manage SolrCloud clusters in Amazon of sizes 1 node to 100’s
User has basic control over instance type, # of instances, # of nodes per instance, ZooKeeper ensemble
Doesn’t have to care about how to start each Solr, how to connect it with ZooKeeper, host names / IPs, etc.
Custom AMI that we just need to “start” stuff on is preferred to configuring a barebones instance each time