• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SolrCloud and Shard Splitting

SolrCloud and Shard Splitting



Presented on 8th June 2013 at the first Bangalore Lucene/Solr Meetup.

Presented on 8th June 2013 at the first Bangalore Lucene/Solr Meetup.



Total Views
Views on SlideShare
Embed Views



11 Embeds 1,589

http://shal.in 1194
https://twitter.com 226
http://www.scoop.it 149
http://feeds.feedburner.com 7
http://flavors.me 5
http://www.feedly.com 2
http://de.flavors.me 2
http://cloud.feedly.com 1
http://jp.flavors.me 1
http://www.shal.in 1
http://silverreader.com 1


Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • In slide 5 there is an error :
    the hash value 1f273c71 and then the 2 HashValue associated to BigCo are in the Shard3 range not Shard1.

    Shard1 is from 8000000 to bfffffff Hex in you image.
    Shard3 is from 00000000 to 3fffffff Hex.
    1f27 xxxx will definitely fit in Shard3 :)
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    SolrCloud and Shard Splitting SolrCloud and Shard Splitting Presentation Transcript

    • SolrCloud and Shard SplittingShalin Shekhar Mangar
    • Bangalore Lucene/Solr Meetup8thJune 2013Who am I?●Apache Lucene/Solr Committer and PMC member●Contributor since January 2008●Currently: Engineer at LucidWorks●Formerly with AOL●Email: shalin@apache.org●Twitter: shalinmangar●Blog: http://shal.in
    • Bangalore Lucene/Solr Meetup8thJune 2013SolrCloud: Overview●Distributed searching/indexing●No single points of failure●Near Real Time Friendly (push replication)●Transaction logs for durability and recovery●Real-time get●Atomic Updates●Optimistic Concurrency●Request forwarding from any node in cluster●A strong contender for your NoSQL needs as well
    • Bangalore Lucene/Solr Meetup8thJune 2013
    • Bangalore Lucene/Solr Meetup8thJune 2013Document Routing80000000-bfffffff00000000-3fffffff40000000-7fffffffc0000000-ffffffffshard1shard4shard3 shard21f273c71(MurmurHash3)1f2700001f27 ffffto(hash)shard1q=my_queryshard.keys=BigCo!numShards=4router=compositeIdid = BigCo!doc5
    • Bangalore Lucene/Solr Meetup8thJune 2013SolrCloud Collections API●/admin/collections?action=CREATE&name=mycollection– &numShards=3– &replicationFactor=4– &maxShardsPerNode=2– &createNodeSet=node1:8080,node2:8080,node3:8080,...– &collection.configName=myconfigset●/admin/collections?action=DELETE&name=mycollection●/admin/collections?action=RELOAD&name=mycollection●/admin/collections?action=CREATEALIAS&name=south– &collections=KA,TN,AP,KL,...●Coming soon: Shard aliases
    • Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Background●Before Solr 4.3, number of shards had to fixed at the timeof collection creation●Forced people to start with large number of shards●If a shard ran too hot, the only fix was to re-index andtherefore re-balance the collection●Each shard is assigned a hash range●Each shard also has a state which defaults to ACTIVE
    • Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Features●Seamless on-the-fly splitting – no downtime required●Retried on failures●/admin/collections?action=SPLITSHARD&collection=mycollection– &shard=shardId●A lower-level CoreAdmin API comes free!– /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2– /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
    • Bangalore Lucene/Solr Meetup8thJune 2013Shard2_0Shard1replicaleaderShard2replicaleaderShard3replicaleaderShard2_1updateShard Splitting
    • Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Mechanism●New sub-shards created in “construction” state●Leader starts forwarding applicable updates, which are bufferedby the sub-shards●Leader index is split and installed on the sub-shards●Sub-shards apply buffered updates●Replicas are created for sub-shards and brought up to speed●Sub-shard becomes “active” and old shard becomes “inactive”
    • Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Tips and Gotchas●Supports collections with a hash based router i.e. “plain”or “compositeId” routers●Operation is executed by the Overseer node, not by thenode you requested●HTTP request is synchronous but operation is async. Aread timeout does not mean failure!●Operation is retried on failure. Check parent leaders logsbefore you re-issue the command or you may end withmore shards than you want
    • Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Tips and gotchas●Solr Admin GUI is not aware of shard states yet so theinactive parent shard is also shown in “green”●The CoreAdmin split command can be used against non-cloud deployments. It will spread docs alternately amongthe sub-indexes●Inactive shards have to be cleaned up manually. Solr 4.4will have a delete shard API●Shard splitting in 4.3 release is buggy. Wait for 4.3.1
    • Bangalore Lucene/Solr Meetup8thJune 2013Shard Splitting: Looking towards the future●GUI integration and better progress reporting/monitoring●Better support for custom sharding use-cases●More flexibility towards number of sub-shards, hashranges, number of replicas etc●Store replication factor per shard●Suggest splits to admins based on cluster state and load
    • Confidential and Proprietary© 2012 LucidWorks14About LucidWorks• Intro to LucidWorks (formerly Lucid Imagination)– Follow: @lucidworks, @lucidimagineer– Learn: http://www.lucidworks.com• Check out SearchHub: http://www.searchhub.org• Solr 4.1 Reference Guide: http://bit.ly/11KSiMN– Older versions: http://bit.ly/12t1Egq• Our Products– LucidWorks Search– LucidWorks Big Data• Lucene Revolution– http://www.lucenerevolution.com
    • Bangalore Lucene/Solr Meetup8thJune 2013Thank youShalin Shekhar MangarLucidWorks