0
Scaling Solr with SolrCloud

Rafał Kuć – Sematext Group, Inc.
@kucrafal @sematext sematext.com
Ta me…
Sematext consultant & engineer
Solr.pl co-founder
Father and husband 
Solr History
Solr 4.1 and counting
Solr 4.0 released
Lucene / Solr merge
Solr 1.4 released
Solr 1.3 released
Incubator gra...
The Past
Master – Slave Deployment
Solr Slave

Solr Slave

Solr Slave

Solr Master

Application

Solr Slave
Master as SPOF
Solr Slave

Solr Slave

Solr Slave

Solr Master

Application

Solr Slave
Replication Time
Solr Slave

Indexing
App

Solr Master

Solr R
Slave

Solr Slave

Querying
App
Too Much for a Single Shard
Solr Slave

Solr Slave

Solr Master

Application
Too Much for a Single Shard
Solr Slave

Solr Slave

Solr Master

Solr Slave

Solr Slave

Solr Master

Application

Solr Sl...
Querying in Multi Master Deployment

Shard1, shard2,
shard3

Solr Slave
Shard 2
Doc

Shard1, shard2,
shard3

Solr Slave
Re...
SolrCloud Comes Into Play
Basic Glossary
Cluster
Node
Collection
Shard
Leader & Replica
Overseer
https://cwiki.apache.org/confluence/display/solr/So...
Apache ZooKeeper
Quorum is required
Sample configuration
clientPort=2181
dataDir=/usr/share/zookeeper/data
tickTime=2000
i...
Solr Instances
-DzkHost=192.168.1.1:2181,
192.168.1.2:2181,192.168.1.3:2181

Solr Server

-DzkHost=192.168.1.2:2181,
192.1...
Collection Creation

Solr Server
$ curl
$ cloud-scripts/zkcli.sh –cmd upconfig -zkhostSolr Server
'http://solr1:8983/solr/...
Single Collection Deployment
Shard1

Shard2

Solr Server

Solr Server

Solr Server

Solr Server

Application
Collection with Replica

Solr Server
Solr Server
$ curl
'http://solr1:8983/solr/admin/collections?action=CREATE&
name=revo...
Collection with Replicas
Shard1
Replica

Shard2
Replica

Solr Server

Solr Server

Shard2

Shard1

Solr Server

Solr Serve...
Querying
Shard2
Id,score

Shard1
Id,score

Solr Server

Solr Server

QUERY

Solr Server

Application
Querying
Shard2
doc

Shard1
doc

Solr Server

Solr Server

Results

Solr Server

Application
Shard and Replica Number
How your data looks
Expected data growth
Target performance
Target node number

Max number of nod...
What should I go for?
More data?

Shard

Shard

Shard

More queries ?

Replica
Replica

Replica
Replica

Replica
Replica
Custom Routing

Default
(numShards present, pre 4.5)

Implicit
(numShards not present, pre 4.5)
Custom Routing Example
Shard1

Shard2

Solr Server

Solr Server

id=userB!3
id=userA!1
id=userA!2
Querying Solr – Default Routing
Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

Shard 6

Shard 7

Shard 8

Solr Collection

A...
Quering Solr – Custom Routing
Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

Shard 6

Shard 7

Shard 8

Solr Collection
q=re...
Collection Manipulation Commands
Create
Delete
Reload
Split
Create Alias
Delete Alias
Shard Creation/Deletion

http://wiki...
Collection Creation
name
numShards
replicationFactor
maxShardsPerNode
createNodeSet
collection.configName
Collection Split Example

$ curl
'http://solr1:8983/solr/admin/collections?action=CREATE&
name=collection1&numShards=2&rep...
Collection Split Example

$ curl 'http://localhost:8983/solr/admin/collections?
action=SPLITSHARD&collection=collection1&s...
Collection Aliasing
$ curl 'http://solr1:8983/solr/admin/collections?
action=CREATEALIAS&name=weekly&collections=20131107,...
Caches
Refreshed with IndexSearcher
Configurable
Different purposes
Different implementations

Solr Cache
Filter Cache
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="128" />
q=lucene+revolutio...
Document Cache
<documentCache class="solr.LRUCache"
size="512"
initialSize="512" />
Query Result Cache
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="128"/>
q=lucene+rev...
Warming
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst><str name="q">*:*</str><...
The Right Directory
StandardDirectory
SimpleFSDirectory

NIOFSDirectory
MMapDirectory

_0.fdt

_0.fdx _0.fnm _0.nvd

_1.fd...
Column oriented fields - DocValues
NRT compatible

Better compression than field cache
Can store data outside of JVM heap
...
Segment Merge
Level 0

a

b

f

Level 1

c

c

d

e

g
Segment Merge Under Control
Merge policy
Merge scheduler

Merge factor
Merge policy configuration
Configuring Segment Merge
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">10</i...
Indexing Throughput Tuning
Maximum indexing threads

RAM buffer size
Maximum buffered documents
Bulk, bulks and bulks
Clou...
TransactionLog
Updates durability
Recovering peer replay
Performant Realtime Get

<updateLog>
<str name="dir">${solr.ulog....
Autocommit or Not?
Automatic data flush

Automatic index view refresh

<autoCommit>
<maxTime>15000</maxTime>
<maxDocs>1000...
Autocommit & openSearcher=true
<autoCommit>
<maxDocs>10</maxDocs>
<openSearcher>true</openSearcher>
</autoCommit>
AutoSoftCommit & openSearcher=false
<autoCommit>
<maxDocs>1000</maxDocs>
<openSearcher>false</openSearcher>
</autoCommit>
...
Postings Formats to the Rescue
Lucene 4.0 >= Flexible Indexing
Postings == docs, positions, payloads
Different postings fo...
Monitoring
Cluster state
Nodes utilization
Memory usage
Cache utilization
Query response time
Warmup times
Garbage collect...
JMX and Solr
JMX and Solr
Administration Panel
Administration Panel
Monitoring with SPM
Monitoring with SPM
Other Monitoring Tools
Ganglia
http://ganglia.sourceforge.net/

New Relic
http://www.newrelic.com/

Opsview
http://www.ops...
We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’r...
Thank You !
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com
SPM...
Upcoming SlideShare
Loading in...5
×

Scaling Solr with SolrCloud

2,405

Published on

Scaling Solr with SolrCloud intermediate talk given during Lucene Revolution 2013 in Dublin. Video of the talk is available at: http://www.youtube.com/watch?v=4VaRSDBhQ9s

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,405
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
46
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Scaling Solr with SolrCloud"

  1. 1. Scaling Solr with SolrCloud Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
  2. 2. Ta me… Sematext consultant & engineer Solr.pl co-founder Father and husband 
  3. 3. Solr History Solr 4.1 and counting Solr 4.0 released Lucene / Solr merge Solr 1.4 released Solr 1.3 released Incubator graduation Solr donated to ASF Y. Seeley creates Solr
  4. 4. The Past
  5. 5. Master – Slave Deployment Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
  6. 6. Master as SPOF Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
  7. 7. Replication Time Solr Slave Indexing App Solr Master Solr R Slave Solr Slave Querying App
  8. 8. Too Much for a Single Shard Solr Slave Solr Slave Solr Master Application
  9. 9. Too Much for a Single Shard Solr Slave Solr Slave Solr Master Solr Slave Solr Slave Solr Master Application Solr Slave Solr Master Solr Slave
  10. 10. Querying in Multi Master Deployment Shard1, shard2, shard3 Solr Slave Shard 2 Doc Shard1, shard2, shard3 Solr Slave Response Shard 1 Application Solr Slave Shard 3 Response
  11. 11. SolrCloud Comes Into Play
  12. 12. Basic Glossary Cluster Node Collection Shard Leader & Replica Overseer https://cwiki.apache.org/confluence/display/solr/SolrCloud+Glossary
  13. 13. Apache ZooKeeper Quorum is required Sample configuration clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888 ZooKeeper ZooKeeper ZooKeeper
  14. 14. Solr Instances -DzkHost=192.168.1.1:2181, 192.168.1.2:2181,192.168.1.3:2181 Solr Server -DzkHost=192.168.1.2:2181, 192.168.1.1:2181,192.168.1.3:2181 Solr Server ZooKeeper ZooKeeper ZooKeeper -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server
  15. 15. Collection Creation Solr Server $ curl $ cloud-scripts/zkcli.sh –cmd upconfig -zkhostSolr Server 'http://solr1:8983/solr/admin/collections?action=CREATE& 192.168.1.2:2181 -confdir name=revolution&numShards=2&replicationFactor=1' /usr/share/config/revolution/conf -conf revolution Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
  16. 16. Single Collection Deployment Shard1 Shard2 Solr Server Solr Server Solr Server Solr Server Application
  17. 17. Collection with Replica Solr Server Solr Server $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=revolution&numShards=2&replicationFactor=2' Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
  18. 18. Collection with Replicas Shard1 Replica Shard2 Replica Solr Server Solr Server Shard2 Shard1 Solr Server Solr Server Application
  19. 19. Querying Shard2 Id,score Shard1 Id,score Solr Server Solr Server QUERY Solr Server Application
  20. 20. Querying Shard2 doc Shard1 doc Solr Server Solr Server Results Solr Server Application
  21. 21. Shard and Replica Number How your data looks Expected data growth Target performance Target node number Max number of nodes = number of shards * (number of replicas + 1)
  22. 22. What should I go for? More data? Shard Shard Shard More queries ? Replica Replica Replica Replica Replica Replica
  23. 23. Custom Routing Default (numShards present, pre 4.5) Implicit (numShards not present, pre 4.5)
  24. 24. Custom Routing Example Shard1 Shard2 Solr Server Solr Server id=userB!3 id=userA!1 id=userA!2
  25. 25. Querying Solr – Default Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection Application
  26. 26. Quering Solr – Custom Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection q=revolution&_route_=userA! Application
  27. 27. Collection Manipulation Commands Create Delete Reload Split Create Alias Delete Alias Shard Creation/Deletion http://wiki.apache.org/solr/SolrCloud
  28. 28. Collection Creation name numShards replicationFactor maxShardsPerNode createNodeSet collection.configName
  29. 29. Collection Split Example $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=collection1&numShards=2&replicationFactor=1'
  30. 30. Collection Split Example $ curl 'http://localhost:8983/solr/admin/collections? action=SPLITSHARD&collection=collection1&shard=shard1'
  31. 31. Collection Aliasing $ curl 'http://solr1:8983/solr/admin/collections? action=CREATEALIAS&name=weekly&collections=20131107, 20131108,20131109,20131110,20131111,20131112,20131113' $ curl 'http://solr1:8983/solr/weekly/select?q=revolution' $ curl 'http://solr1:8983/solr/admin/collections? action=DELETEALIAS&name=weekly'
  32. 32. Caches Refreshed with IndexSearcher Configurable Different purposes Different implementations Solr Cache
  33. 33. Filter Cache <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="128" /> q=lucene+revolution+city:Dublin q=lucene+revolution&fq=city:Dublin q=*:*&fq={!cache=false}city:Dublin q=*:*&fq={!frange l=0 u=10 cache=false cost=200}sum(price,pro)
  34. 34. Document Cache <documentCache class="solr.LRUCache" size="512" initialSize="512" />
  35. 35. Query Result Cache <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="128"/> q=lucene+revolution+city:Dublin&sort=date+desc&start=0&rows=10 q=lucene+revolution&fq=city:Dublin&sort=date+desc&start=0&rows=10 <queryResultWindowSize>20</queryResultWindowSize> <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
  36. 36. Warming <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <useColdSearcher>false</useColdSearcher>
  37. 37. The Right Directory StandardDirectory SimpleFSDirectory NIOFSDirectory MMapDirectory _0.fdt _0.fdx _0.fnm _0.nvd _1.fdt _1.fdx _1.fnm _1.nvd NRTCachingDirectory RAMDirectory <directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />
  38. 38. Column oriented fields - DocValues NRT compatible Better compression than field cache Can store data outside of JVM heap Can improve things for dynamic indices <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true"/> <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true" docValuesFormat="Disk"/>
  39. 39. Segment Merge Level 0 a b f Level 1 c c d e g
  40. 40. Segment Merge Under Control Merge policy Merge scheduler Merge factor Merge policy configuration
  41. 41. Configuring Segment Merge <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/> <mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/> <mergeFactor>10</mergeFactor>
  42. 42. Indexing Throughput Tuning Maximum indexing threads RAM buffer size Maximum buffered documents Bulk, bulks and bulks CloudSolrServer Autocommit Cutting off unnecessary stuff
  43. 43. TransactionLog Updates durability Recovering peer replay Performant Realtime Get <updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog> <requestHandler name="/get" class="solr.RealTimeGetHandler"> </requestHandler>
  44. 44. Autocommit or Not? Automatic data flush Automatic index view refresh <autoCommit> <maxTime>15000</maxTime> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  45. 45. Autocommit & openSearcher=true <autoCommit> <maxDocs>10</maxDocs> <openSearcher>true</openSearcher> </autoCommit>
  46. 46. AutoSoftCommit & openSearcher=false <autoCommit> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxDocs>10</maxDocs> </autoSoftCommit>
  47. 47. Postings Formats to the Rescue Lucene 4.0 >= Flexible Indexing Postings == docs, positions, payloads Different postings formats available Bloom Pulsing Simple text Direct Memory <codecFactory class="solr.SchemaCodecFactory" /> <field name="id" type="string_pulsing" indexed="true" stored="true" /> <fieldType name="string_pulsing" class="solr.StrField" postingsFormat="Pulsing41" />
  48. 48. Monitoring Cluster state Nodes utilization Memory usage Cache utilization Query response time Warmup times Garbage collector work
  49. 49. JMX and Solr
  50. 50. JMX and Solr
  51. 51. Administration Panel
  52. 52. Administration Panel
  53. 53. Monitoring with SPM
  54. 54. Monitoring with SPM
  55. 55. Other Monitoring Tools Ganglia http://ganglia.sourceforge.net/ New Relic http://www.newrelic.com/ Opsview http://www.opsview.com
  56. 56. We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html
  57. 57. Thank You ! Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com SPM discount code: LR2013SPM20 @ Sematext booth ;)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×