• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scaling Solr with Solr Cloud
 

Scaling Solr with Solr Cloud

on

  • 7,916 views

How to make the most of your Solr Cloud clusters.

How to make the most of your Solr Cloud clusters.

Statistics

Views

Total Views
7,916
Views on SlideShare
1,960
Embed Views
5,956

Actions

Likes
12
Downloads
107
Comments
0

22 Embeds 5,956

http://blog.sematext.com 5677
http://feedly.com 110
http://cloud.feedly.com 64
http://www.newsblur.com 22
http://newsblur.com 21
http://feeds.feedburner.com 13
http://127.0.0.1 7
http://www.feedspot.com 7
http://inoreader.com 6
http://sematext.wordpress.com 5
http://sematext.com 4
http://localhost 4
http://reader.aol.com 3
http://digg.com 3
https://www.google.com 2
http://feedproxy.google.com 2
https://www.commafeed.com 1
http://www.tuicool.com 1
https://reader.aol.com 1
http://smashingreader.com 1
http://summary 1
http://www.google.co.uk 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Scaling Solr with Solr Cloud Scaling Solr with Solr Cloud Presentation Transcript

    • Scaling Solr with SolrCloud Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
    • Ta me… Sematext consultant & engineer Solr.pl co-founder Father and husband 
    • Solr History Solr 4.1 and counting Solr 4.0 released Lucene / Solr merge Solr 1.4 released Solr 1.3 released Incubator graduation Solr donated to ASF Y. Seeley creates Solr
    • The Past
    • Master – Slave Deployment Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
    • Master as SPOF Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
    • Replication Time Solr Slave Indexing App Solr Master Solr R Slave Solr Slave Querying App
    • Too Much for a Single Shard Solr Slave Solr Slave Solr Master Application
    • Too Much for a Single Shard Solr Slave Solr Slave Solr Master Solr Slave Solr Slave Solr Master Application Solr Slave Solr Master Solr Slave
    • Querying in Multi Master Deployment Shard1, shard2, shard3 Solr Slave Shard 2 Doc Shard1, shard2, shard3 Solr Slave Response Shard 1 Application Solr Slave Shard 3 Response
    • SolrCloud Comes Into Play
    • Basic Glossary Cluster Node Collection Shard Leader & Replica Overseer https://cwiki.apache.org/confluence/display/solr/SolrCloud+Glossary
    • Apache ZooKeeper Quorum is required Sample configuration clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888 ZooKeeper ZooKeeper ZooKeeper
    • Solr Instances -DzkHost=192.168.1.1:2181, 192.168.1.2:2181,192.168.1.3:2181 Solr Server -DzkHost=192.168.1.2:2181, 192.168.1.1:2181,192.168.1.3:2181 Solr Server ZooKeeper ZooKeeper ZooKeeper -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server
    • Collection Creation Solr Server $ curl $ cloud-scripts/zkcli.sh –cmd upconfig -zkhostSolr Server 'http://solr1:8983/solr/admin/collections?action=CREATE& 192.168.1.2:2181 -confdir name=revolution&numShards=2&replicationFactor=1' /usr/share/config/revolution/conf -conf revolution Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
    • Single Collection Deployment Shard1 Shard2 Solr Server Solr Server Solr Server Solr Server Application
    • Collection with Replica Solr Server Solr Server $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=revolution&numShards=2&replicationFactor=2' Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
    • Collection with Replicas Shard1 Replica Shard2 Replica Solr Server Solr Server Shard2 Shard1 Solr Server Solr Server Application
    • Querying Shard2 Id,score Shard1 Id,score Solr Server Solr Server QUERY Solr Server Application
    • Querying Shard2 doc Shard1 doc Solr Server Solr Server Results Solr Server Application
    • Shard and Replica Number How your data looks Expected data growth Target performance Target node number Max number of nodes = number of shards * (number of replicas + 1)
    • What should I go for? More data? Shard Shard Shard More queries ? Replica Replica Replica Replica Replica Replica
    • Custom Routing Default (numShards present, pre 4.5) Implicit (numShards not present, pre 4.5)
    • Custom Routing Example Shard1 Shard2 Solr Server Solr Server id=userB!3 id=userA!1 id=userA!2
    • Querying Solr – Default Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection Application
    • Quering Solr – Custom Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection q=revolution&_route_=userA! Application
    • Collection Manipulation Commands Create Delete Reload Split Create Alias Delete Alias Shard Creation/Deletion http://wiki.apache.org/solr/SolrCloud
    • Collection Creation name numShards replicationFactor maxShardsPerNode createNodeSet collection.configName
    • Collection Split Example $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=collection1&numShards=2&replicationFactor=1'
    • Collection Split Example $ curl 'http://localhost:8983/solr/admin/collections? action=SPLITSHARD&collection=collection1&shard=shard1'
    • Collection Aliasing $ curl 'http://solr1:8983/solr/admin/collections? action=CREATEALIAS&name=weekly&collections=20131107, 20131108,20131109,20131110,20131111,20131112,20131113' $ curl 'http://solr1:8983/solr/weekly/select?q=revolution' $ curl 'http://solr1:8983/solr/admin/collections? action=DELETEALIAS&name=weekly'
    • Caches Refreshed with IndexSearcher Configurable Different purposes Different implementations Solr Cache
    • Filter Cache <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="128" /> q=lucene+revolution+city:Dublin q=lucene+revolution&fq=city:Dublin q=*:*&fq={!cache=false}city:Dublin q=*:*&fq={!frange l=0 u=10 cache=false cost=200}sum(price,pro)
    • Document Cache <documentCache class="solr.LRUCache" size="512" initialSize="512" />
    • Query Result Cache <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="128"/> q=lucene+revolution+city:Dublin&sort=date+desc&start=0&rows=10 q=lucene+revolution&fq=city:Dublin&sort=date+desc&start=0&rows=10 <queryResultWindowSize>20</queryResultWindowSize> <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
    • Warming <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <useColdSearcher>false</useColdSearcher>
    • The Right Directory StandardDirectory SimpleFSDirectory NIOFSDirectory MMapDirectory _0.fdt _0.fdx _0.fnm _0.nvd _1.fdt _1.fdx _1.fnm _1.nvd NRTCachingDirectory RAMDirectory <directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />
    • Column oriented fields - DocValues NRT compatible Better compression than field cache Can store data outside of JVM heap Can improve things for dynamic indices <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true"/> <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true" docValuesFormat="Disk"/>
    • Segment Merge Level 0 a b f Level 1 c c d e g
    • Segment Merge Under Control Merge policy Merge scheduler Merge factor Merge policy configuration
    • Configuring Segment Merge <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/> <mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/> <mergeFactor>10</mergeFactor>
    • Indexing Throughput Tuning Maximum indexing threads RAM buffer size Maximum buffered documents Bulk, bulks and bulks CloudSolrServer Autocommit Cutting off unnecessary stuff
    • TransactionLog Updates durability Recovering peer replay Performant Realtime Get <updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog> <requestHandler name="/get" class="solr.RealTimeGetHandler"> </requestHandler>
    • Autocommit or Not? Automatic data flush Automatic index view refresh <autoCommit> <maxTime>15000</maxTime> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
    • Autocommit & openSearcher=true <autoCommit> <maxDocs>10</maxDocs> <openSearcher>true</openSearcher> </autoCommit>
    • AutoSoftCommit & openSearcher=false <autoCommit> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxDocs>10</maxDocs> </autoSoftCommit>
    • Postings Formats to the Rescue Lucene 4.0 >= Flexible Indexing Postings == docs, positions, payloads Different postings formats available Bloom Pulsing Simple text Direct Memory <codecFactory class="solr.SchemaCodecFactory" /> <field name="id" type="string_pulsing" indexed="true" stored="true" /> <fieldType name="string_pulsing" class="solr.StrField" postingsFormat="Pulsing41" />
    • Monitoring Cluster state Nodes utilization Memory usage Cache utilization Query response time Warmup times Garbage collector work
    • JMX and Solr
    • JMX and Solr
    • Administration Panel
    • Administration Panel
    • Monitoring with SPM
    • Monitoring with SPM
    • Other Monitoring Tools Ganglia http://ganglia.sourceforge.net/ New Relic http://www.newrelic.com/ Opsview http://www.opsview.com
    • We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html
    • Thank You ! Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com SPM discount code: LR2013SPM20 @ Sematext booth ;)