• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scaling Solr with SolrCloud
 

Scaling Solr with SolrCloud

on

  • 789 views

Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale ...

Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you'll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You'll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.

Statistics

Views

Total Views
789
Views on SlideShare
634
Embed Views
155

Actions

Likes
1
Downloads
29
Comments
0

2 Embeds 155

http://www.lucenerevolution.org 153
http://lucenerevolution.org 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Scaling Solr with SolrCloud Scaling Solr with SolrCloud Presentation Transcript

    • Scaling Solr with SolrCloud Rafał  Kuć  – Sematext Group, Inc. @kucrafal @sematext sematext.com
    • Ta  me… Sematext consultant & engineer Solr.pl co-founder Father and husband 
    • Solr History Solr 4.1 and counting Solr 4.0 released Lucene / Solr merge Solr 1.4 released Solr 1.3 released Incubator graduation Solr donated to ASF Y. Seeley creates Solr
    • The Past
    • Master – Slave Deployment Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
    • Master as SPOF Solr Slave Solr Slave Solr Slave Solr Master Application Solr Slave
    • Replication Time Solr Slave Indexing App Solr Master Solr R Slave Solr Slave Querying App
    • Too Much for a Single Shard Solr Slave Solr Slave Solr Master Application
    • Too Much for a Single Shard Solr Slave Solr Slave Solr Master Solr Slave Solr Slave Solr Master Application Solr Slave Solr Master Solr Slave
    • Querying in Multi Master Deployment Shard1, shard2, shard3 Solr Slave Shard 2 Doc Shard1, shard2, shard3 Solr Slave Response Shard 1 Application Solr Slave Shard 3 Response
    • SolrCloud Comes Into Play
    • Basic Glossary Cluster Node Collection Shard Leader & Replica Overseer https://cwiki.apache.org/confluence/display/solr/SolrCloud+Glossary
    • Apache ZooKeeper Quorum is required Sample configuration clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888 ZooKeeper ZooKeeper ZooKeeper
    • Solr Instances -DzkHost=192.168.1.1:2181, 192.168.1.2:2181,192.168.1.3:2181 Solr Server -DzkHost=192.168.1.2:2181, 192.168.1.1:2181,192.168.1.3:2181 Solr Server ZooKeeper ZooKeeper ZooKeeper -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server -DzkHost=192.168.1.3:2181, 192.168.1.1:2181,192.168.1.2:2181 Solr Server
    • Collection Creation Solr Server Solr $ curl $ cloud-scripts/zkcli.sh –cmd upconfig -zkhost Server 'http://solr1:8983/solr/admin/collections?action=CREATE& 192.168.1.2:2181 -confdir name=revolution&numShards=2&replicationFactor=1' /usr/share/config/revolution/conf -conf revolution Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
    • Single Collection Deployment Shard1 Shard2 Solr Server Solr Server Solr Server Solr Server Application
    • Collection with Replica Solr Server Solr Server $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=revolution&numShards=2&replicationFactor=2' Solr Server ZooKeeper ZooKeeper ZooKeeper Solr Server
    • Collection with Replicas Shard1 Replica Shard2 Replica Solr Server Solr Server Shard2 Shard1 Solr Server Solr Server Application
    • Querying Shard2 Id,score Shard1 Id,score Solr Server Solr Server QUERY Solr Server Application
    • Querying Shard2 doc Shard1 doc Solr Server Solr Server Results Solr Server Application
    • Shard and Replica Number How your data looks Expected data growth Target performance Target node number Max number of nodes = number of shards * (number of replicas + 1)
    • What should I go for? More data? Shard Shard Shard More queries ? Replica Replica Replica Replica Replica Replica
    • Custom Routing Default (numShards present, pre 4.5) Implicit (numShards not present, pre 4.5)
    • Custom Routing Example Shard1 Shard2 Solr Server Solr Server id=userB!3 id=userA!1 id=userA!2
    • Querying Solr – Default Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection Application
    • Quering Solr – Custom Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Solr Collection q=revolution&_route_=userA! Application
    • Collection Manipulation Commands Create Delete Reload Split Create Alias Delete Alias Shard Creation/Deletion http://wiki.apache.org/solr/SolrCloud
    • Collection Creation name numShards replicationFactor maxShardsPerNode createNodeSet collection.configName
    • Collection Split Example $ curl 'http://solr1:8983/solr/admin/collections?action=CREATE& name=collection1&numShards=2&replicationFactor=1'
    • Collection Split Example $ curl 'http://localhost:8983/solr/admin/collections? action=SPLITSHARD&collection=collection1&shard=shard1'
    • Collection Aliasing $ curl 'http://solr1:8983/solr/admin/collections? action=CREATEALIAS&name=weekly&collections=20131107, 20131108,20131109,20131110,20131111,20131112,20131113' $ curl 'http://solr1:8983/solr/weekly/select?q=revolution' $ curl 'http://solr1:8983/solr/admin/collections? action=DELETEALIAS&name=weekly'
    • Caches Refreshed with IndexSearcher Configurable Different purposes Different implementations Solr Cache
    • Filter Cache <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="128" /> q=lucene+revolution+city:Dublin q=lucene+revolution&fq=city:Dublin q=*:*&fq={!cache=false}city:Dublin q=*:*&fq={!frange l=0 u=10 cache=false cost=200}sum(price,pro)
    • Document Cache <documentCache class="solr.LRUCache" size="512" initialSize="512" />
    • Query Result Cache <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="128"/> q=lucene+revolution+city:Dublin&sort=date+desc&start=0&rows=10 q=lucene+revolution&fq=city:Dublin&sort=date+desc&start=0&rows=10 <queryResultWindowSize>20</queryResultWindowSize> <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
    • Warming <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*:*</str><str name="sort">date desc</str></lst> <lst><str name="q">keywords:* OR tags:*</str></lst> <lst><str name="q">*:*</str><str name="fq">active:*</str></lst> </arr> </listener> <useColdSearcher>false</useColdSearcher>
    • The Right Directory StandardDirectory SimpleFSDirectory NIOFSDirectory MMapDirectory _0.fdt _0.fdx _0.fnm _0.nvd _1.fdt _1.fdx _1.fnm _1.nvd NRTCachingDirectory RAMDirectory <directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />
    • Column oriented fields - DocValues NRT compatible Better compression than field cache Can store data outside of JVM heap Can improve things for dynamic indices <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true"/> <field name="categories" type="string" indexed="false" stored="false" multiValued="true" docValues="true" docValuesFormat="Disk"/>
    • Segment Merge Level 0 a b f Level 1 c c d e g
    • Segment Merge Under Control Merge policy Merge scheduler Merge factor Merge policy configuration
    • Configuring Segment Merge <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/> <mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/> <mergeFactor>10</mergeFactor>
    • Indexing Throughput Tuning Maximum indexing threads RAM buffer size Maximum buffered documents Bulk, bulks and bulks CloudSolrServer Autocommit Cutting off unnecessary stuff
    • TransactionLog Updates durability Recovering peer replay Performant Realtime Get <updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog> <requestHandler name="/get" class="solr.RealTimeGetHandler"> </requestHandler>
    • Autocommit or Not? Automatic data flush Automatic index view refresh <autoCommit> <maxTime>15000</maxTime> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
    • Autocommit & openSearcher=true <autoCommit> <maxDocs>10</maxDocs> <openSearcher>true</openSearcher> </autoCommit>
    • AutoSoftCommit & openSearcher=false <autoCommit> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxDocs>10</maxDocs> </autoSoftCommit>
    • Postings Formats to the Rescue Lucene 4.0 >= Flexible Indexing Postings == docs, positions, payloads Different postings formats available Bloom Pulsing Simple text Direct Memory <codecFactory class="solr.SchemaCodecFactory" /> <field name="id" type="string_pulsing" indexed="true" stored="true" /> <fieldType name="string_pulsing" class="solr.StrField" postingsFormat="Pulsing41" />
    • Monitoring Cluster state Nodes utilization Memory usage Cache utilization Query response time Warmup times Garbage collector work
    • JMX and Solr
    • JMX and Solr
    • Administration Panel
    • Administration Panel
    • Monitoring with SPM
    • Monitoring with SPM
    • Other Monitoring Tools Ganglia http://ganglia.sourceforge.net/ New Relic http://www.newrelic.com/ Opsview http://www.opsview.com
    • We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html
    • Thank You ! Rafał  Kuć   @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com SPM discount code: LR2013SPM20 @ Sematext booth ;)