SolrCloud and Shard Splitting


Published on

Presented on 8th June 2013 at the first Bangalore Lucene/Solr Meetup.

1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SolrCloud and Shard Splitting

  1. 1. SolrCloud and Shard SplittingShalin Shekhar Mangar
  2. 2. Bangalore Lucene/Solr Meetup8thJune 2013Who am I?●Apache Lucene/Solr Committer and PMC member●Contributor since January 2008●Currently: Engineer at LucidWorks●Formerly with AOL●Email:●Twitter: shalinmangar●Blog:
  3. 3. Bangalore Lucene/Solr Meetup8thJune 2013SolrCloud: Overview●Distributed searching/indexing●No single points of failure●Near Real Time Friendly (push replication)●Transaction logs for durability and recovery●Real-time get●Atomic Updates●Optimistic Concurrency●Request forwarding from any node in cluster●A strong contender for your NoSQL needs as well
  4. 4. Bangalore Lucene/Solr Meetup8thJune 2013
  5. 5. Bangalore Lucene/Solr Meetup8thJune 2013Document Routing80000000-bfffffff00000000-3fffffff40000000-7fffffffc0000000-ffffffffshard1shard4shard3 shard21f273c71(MurmurHash3)1f2700001f27 ffffto(hash)shard1q=my_queryshard.keys=BigCo!numShards=4router=compositeIdid = BigCo!doc5
  6. 6. Bangalore Lucene/Solr Meetup8thJune 2013SolrCloud Collections API●/admin/collections?action=CREATE&name=mycollection– &numShards=3– &replicationFactor=4– &maxShardsPerNode=2– &createNodeSet=node1:8080,node2:8080,node3:8080,...– &collection.configName=myconfigset●/admin/collections?action=DELETE&name=mycollection●/admin/collections?action=RELOAD&name=mycollection●/admin/collections?action=CREATEALIAS&name=south– &collections=KA,TN,AP,KL,...●Coming soon: Shard aliases
  7. 7. Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Background●Before Solr 4.3, number of shards had to fixed at the timeof collection creation●Forced people to start with large number of shards●If a shard ran too hot, the only fix was to re-index andtherefore re-balance the collection●Each shard is assigned a hash range●Each shard also has a state which defaults to ACTIVE
  8. 8. Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Features●Seamless on-the-fly splitting – no downtime required●Retried on failures●/admin/collections?action=SPLITSHARD&collection=mycollection– &shard=shardId●A lower-level CoreAdmin API comes free!– /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2– /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
  9. 9. Bangalore Lucene/Solr Meetup8thJune 2013Shard2_0Shard1replicaleaderShard2replicaleaderShard3replicaleaderShard2_1updateShard Splitting
  10. 10. Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Mechanism●New sub-shards created in “construction” state●Leader starts forwarding applicable updates, which are bufferedby the sub-shards●Leader index is split and installed on the sub-shards●Sub-shards apply buffered updates●Replicas are created for sub-shards and brought up to speed●Sub-shard becomes “active” and old shard becomes “inactive”
  11. 11. Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Tips and Gotchas●Supports collections with a hash based router i.e. “plain”or “compositeId” routers●Operation is executed by the Overseer node, not by thenode you requested●HTTP request is synchronous but operation is async. Aread timeout does not mean failure!●Operation is retried on failure. Check parent leaders logsbefore you re-issue the command or you may end withmore shards than you want
  12. 12. Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Tips and gotchas●Solr Admin GUI is not aware of shard states yet so theinactive parent shard is also shown in “green”●The CoreAdmin split command can be used against non-cloud deployments. It will spread docs alternately amongthe sub-indexes●Inactive shards have to be cleaned up manually. Solr 4.4will have a delete shard API●Shard splitting in 4.3 release is buggy. Wait for 4.3.1
  13. 13. Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Looking towards the future●GUI integration and better progress reporting/monitoring●Better support for custom sharding use-cases●More flexibility towards number of sub-shards, hashranges, number of replicas etc●Store replication factor per shard●Suggest splits to admins based on cluster state and load
  14. 14. Confidential and Proprietary© 2012 LucidWorks14About LucidWorks• Intro to LucidWorks (formerly Lucid Imagination)– Follow: @lucidworks, @lucidimagineer– Learn:• Check out SearchHub:• Solr 4.1 Reference Guide:– Older versions:• Our Products– LucidWorks Search– LucidWorks Big Data• Lucene Revolution–
  15. 15. Bangalore Lucene/Solr Meetup8thJune 2013Thank youShalin Shekhar MangarLucidWorks