The document discusses benchmarking the performance of Apache Solr. It describes testing the indexing performance of SolrCloud clusters of varying sizes. The results show that indexing performance scales nearly linearly as nodes are added. It also discusses using the Solr Scale Toolkit, which is a set of tools for deploying, managing, and benchmarking SolrCloud clusters. Future work mentioned includes benchmarking mixed workloads and integrating chaos monkey tests.
more shards == better indexing throughput (if your servers can handle it)
replication is as fast as your slowest replica
replicas have to re-analyze documents
future – would like to send pre-analyzed docs between leader and replica, esp if text analysis is complex
Not much capacity available for running queries on this node
First couple of passes, I either had really bad performance (too random) or really fast (not random enough)
Make it very easy to launch and manage SolrCloud clusters in Amazon of sizes 1 node to 100’s
User has basic control over instance type, # of instances, # of nodes per instance, ZooKeeper ensemble
Doesn’t have to care about how to start each Solr, how to connect it with ZooKeeper, host names / IPs, etc.
Custom AMI that we just need to “start” stuff on is preferred to configuring a barebones instance each time