2. Bangalore Lucene/Solr Meetup
8th
June 2013
Who am I?
●
Apache Lucene/Solr Committer and PMC member
●
Contributor since January 2008
●
Currently: Engineer at LucidWorks
●
Formerly with AOL
●
Email: shalin@apache.org
●
Twitter: shalinmangar
●
Blog: http://shal.in
3. Bangalore Lucene/Solr Meetup
8th
June 2013
SolrCloud: Overview
●
Distributed searching/indexing
●
No single points of failure
●
Near Real Time Friendly (push replication)
●
Transaction logs for durability and recovery
●
Real-time get
●
Atomic Updates
●
Optimistic Concurrency
●
Request forwarding from any node in cluster
●
A strong contender for your NoSQL needs as well
7. Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Background
●
Before Solr 4.3, number of shards had to fixed at the time
of collection creation
●
Forced people to start with large number of shards
●
If a shard ran too hot, the only fix was to re-index and
therefore re-balance the collection
●
Each shard is assigned a hash range
●
Each shard also has a state which defaults to 'ACTIVE'
8. Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Features
●
Seamless on-the-fly splitting – no downtime required
●
Retried on failures
●
/admin/collections?
action=SPLITSHARD&collection=mycollection
– &shard=shardId
●
A lower-level CoreAdmin API comes free!
– /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2
– /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
9. Bangalore Lucene/Solr Meetup
8th
June 2013
Shard2_0
Shard1
replic
a
leade
r
Shard2
replic
a
leade
r
Shard3
replic
a
leade
r
Shard2_1
update
Shard Splitting
10. Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Mechanism
●
New sub-shards created in “construction” state
●
Leader starts forwarding applicable updates, which are buffered
by the sub-shards
●
Leader index is split and installed on the sub-shards
●
Sub-shards apply buffered updates
●
Replicas are created for sub-shards and brought up to speed
●
Sub-shard becomes “active” and old shard becomes “inactive”
11. Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and Gotchas
●
Supports collections with a hash based router i.e. “plain”
or “compositeId” routers
●
Operation is executed by the Overseer node, not by the
node you requested
●
HTTP request is synchronous but operation is async. A
read timeout does not mean failure!
●
Operation is retried on failure. Check parent leader's logs
before you re-issue the command or you may end with
more shards than you want
12. Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and gotchas
●
Solr Admin GUI is not aware of shard states yet so the
inactive parent shard is also shown in “green”
●
The CoreAdmin split command can be used against non-
cloud deployments. It will spread docs alternately among
the sub-indexes
●
Inactive shards have to be cleaned up manually. Solr 4.4
will have a delete shard API
●
Shard splitting in 4.3 release is buggy. Wait for 4.3.1
13. Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Looking towards the future
●
GUI integration and better progress reporting/monitoring
●
Better support for custom sharding use-cases
●
More flexibility towards number of sub-shards, hash
ranges, number of replicas etc
●
Store replication factor per shard
●
Suggest splits to admins based on cluster state and load