Scaling Solr with Solr Cloud

18,943 views

Published on

How to make the most of your Solr Cloud clusters.

Scaling Solr with Solr Cloud

  1. 1. Scaling Solr with SolrCloud Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
  2. 2. Ta me… Sematext consultant & engineer Solr.pl co-founder Father and husband 
  3. 3. Solr History Solr 4.1 and counting Solr 4.0 released Lucene / Solr merge Solr 1.4 released Solr 1.3 released Incubator graduation Solr donated to ASF Y. Seeley creates Solr
  4. 4. The Past
  5. 5. Master – Slave Deployment Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
  6. 6. Master as SPOF Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
  7. 7. Replication Time Solr Slave Indexing App Solr Master Solr R Slave Solr Slave Querying App
  8. 8. Too Much for a Single Shard Solr Slave Solr Slave Solr Master Application
  9. 9. Too Much for a Single Shard Solr Slave Solr Slave Solr Master Solr Slave Solr Slave Solr Master Application Solr Slave Solr Master Solr Slave
  10. 10. Querying in Multi Master Deployment Shard1, shard2, shard3 Solr Slave Shard 2 Doc Shard1, shard2, shard3 Solr Slave Response Shard 1 Application Solr Slave Shard 3 Response
  11. 11. SolrCloud Comes Into Play
  12. 12. Basic Glossary Cluster Node Collection Shard Leader & Replica Overseer https://cwiki.apache.org/confluence/display/solr/SolrCloud+Glossary
  13. 13. Apache ZooKeeper Quorum is required Sample configuration clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888 ZooKeeper ZooKeeper ZooKeeper
  14. 14. Solr Instances -DzkHost=192.168.1.1:2181, 192.168.1.2:2181,192.168.1.3:2181 Solr Server -DzkHost=192.168.1.2:2181, 192.168.1.1:2181,192.168.1.3:2181 Solr Server ZooKeeper ZooKeeper ZooKeeper -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server
  15. 15. Collection Creation Solr Server $ curl $ cloud-scripts/zkcli.sh –cmd upconfig -zkhostSolr Server 'http://solr1:8983/solr/admin/collections?action=CREATE& 192.168.1.2:2181 -confdir name=revolution&numShards=2&replicationFactor=1' /usr/share/config/revolution/conf -conf revolution Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
  16. 16. Single Collection Deployment Shard1 Shard2 Solr Server Solr Server Solr Server Solr Server Application
  17. 17. Collection with Replica Solr Server Solr Server $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=revolution&numShards=2&replicationFactor=2' Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
  18. 18. Collection with Replicas Shard1 Replica Shard2 Replica Solr Server Solr Server Shard2 Shard1 Solr Server Solr Server Application
  19. 19. Querying Shard2 Id,score Shard1 Id,score Solr Server Solr Server QUERY Solr Server Application
  20. 20. Querying Shard2 doc Shard1 doc Solr Server Solr Server Results Solr Server Application
  21. 21. Shard and Replica Number How your data looks Expected data growth Target performance Target node number Max number of nodes = number of shards * (number of replicas + 1)
  22. 22. What should I go for? More data? Shard Shard Shard More queries ? Replica Replica Replica Replica Replica Replica
  23. 23. Custom Routing Default (numShards present, pre 4.5) Implicit (numShards not present, pre 4.5)
  24. 24. Custom Routing Example Shard1 Shard2 Solr Server Solr Server id=userB!3 id=userA!1 id=userA!2
  25. 25. Querying Solr – Default Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection Application
  26. 26. Quering Solr – Custom Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection q=revolution&_route_=userA! Application
  27. 27. Collection Manipulation Commands Create Delete Reload Split Create Alias Delete Alias Shard Creation/Deletion http://wiki.apache.org/solr/SolrCloud
  28. 28. Collection Creation name numShards replicationFactor maxShardsPerNode createNodeSet collection.configName
  29. 29. Collection Split Example $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=collection1&numShards=2&replicationFactor=1'
  30. 30. Collection Split Example $ curl 'http://localhost:8983/solr/admin/collections? action=SPLITSHARD&collection=collection1&shard=shard1'
  31. 31. Collection Aliasing $ curl 'http://solr1:8983/solr/admin/collections? action=CREATEALIAS&name=weekly&collections=20131107, 20131108,20131109,20131110,20131111,20131112,20131113' $ curl 'http://solr1:8983/solr/weekly/select?q=revolution' $ curl 'http://solr1:8983/solr/admin/collections? action=DELETEALIAS&name=weekly'
  32. 32. Caches Refreshed with IndexSearcher Configurable Different purposes Different implementations Solr Cache
  33. 33. Filter Cache <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="128" /> q=lucene+revolution+city:Dublin q=lucene+revolution&fq=city:Dublin q=*:*&fq={!cache=false}city:Dublin q=*:*&fq={!frange l=0 u=10 cache=false cost=200}sum(price,pro)
  34. 34. Document Cache <documentCache class="solr.LRUCache" size="512" initialSize="512" />
  35. 35. Query Result Cache <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="128"/> q=lucene+revolution+city:Dublin&sort=date+desc&start=0&rows=10 q=lucene+revolution&fq=city:Dublin&sort=date+desc&start=0&rows=10 <queryResultWindowSize>20</queryResultWindowSize> <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
  36. 36. Warming <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <useColdSearcher>false</useColdSearcher>
  37. 37. The Right Directory StandardDirectory SimpleFSDirectory NIOFSDirectory MMapDirectory _0.fdt _0.fdx _0.fnm _0.nvd _1.fdt _1.fdx _1.fnm _1.nvd NRTCachingDirectory RAMDirectory <directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />
  38. 38. Column oriented fields - DocValues NRT compatible Better compression than field cache Can store data outside of JVM heap Can improve things for dynamic indices <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true"/> <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true" docValuesFormat="Disk"/>
  39. 39. Segment Merge Level 0 a b f Level 1 c c d e g
  40. 40. Segment Merge Under Control Merge policy Merge scheduler Merge factor Merge policy configuration
  41. 41. Configuring Segment Merge <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/> <mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/> <mergeFactor>10</mergeFactor>
  42. 42. Indexing Throughput Tuning Maximum indexing threads RAM buffer size Maximum buffered documents Bulk, bulks and bulks CloudSolrServer Autocommit Cutting off unnecessary stuff
  43. 43. TransactionLog Updates durability Recovering peer replay Performant Realtime Get <updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog> <requestHandler name="/get" class="solr.RealTimeGetHandler"> </requestHandler>
  44. 44. Autocommit or Not? Automatic data flush Automatic index view refresh <autoCommit> <maxTime>15000</maxTime> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  45. 45. Autocommit & openSearcher=true <autoCommit> <maxDocs>10</maxDocs> <openSearcher>true</openSearcher> </autoCommit>
  46. 46. AutoSoftCommit & openSearcher=false <autoCommit> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxDocs>10</maxDocs> </autoSoftCommit>
  47. 47. Postings Formats to the Rescue Lucene 4.0 >= Flexible Indexing Postings == docs, positions, payloads Different postings formats available Bloom Pulsing Simple text Direct Memory <codecFactory class="solr.SchemaCodecFactory" /> <field name="id" type="string_pulsing" indexed="true" stored="true" /> <fieldType name="string_pulsing" class="solr.StrField" postingsFormat="Pulsing41" />
  48. 48. Monitoring Cluster state Nodes utilization Memory usage Cache utilization Query response time Warmup times Garbage collector work
  49. 49. JMX and Solr
  50. 50. JMX and Solr
  51. 51. Administration Panel
  52. 52. Administration Panel
  53. 53. Monitoring with SPM
  54. 54. Monitoring with SPM
  55. 55. Other Monitoring Tools Ganglia http://ganglia.sourceforge.net/ New Relic http://www.newrelic.com/ Opsview http://www.opsview.com
  56. 56. We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html
  57. 57. Thank You ! Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com SPM discount code: LR2013SPM20 @ Sematext booth ;)

×