Administering and Monitoring SolrCloud

Rafał  Kuć  – Sematext Group, Inc.
@kucrafal @sematext sematext.com
Ta  me…
Sematext consultant & engineer
Solr.pl co-founder
Father and husband 
SolrCloud Concepts
Shard1
Replica

Shard2
Replica

Solr Server

Solr Server

Shard2

Shard1

Solr Server

Solr Server

Application
Local SolrCloud Cluster
java -Dbootstrap_confdir=./solr/revolution/conf
-Dcollection.configName=revolution -DzkRun -DnumShards=1 -jar
start.jar

Runs embedded ZooKeeper
Bootstraps collection with 1 shards
Starts Solr
Starting Solr Cluster
No Collection

No Collection

-DzkHost=192.168.1.1:2181,
192.168.1.2:2181,192.168.1.3:2181

Solr Server

-DzkHost=192.168.1.3:2181,
192.168.1.1:2181,192.168.1.2:2181

Solr Server

No Collection

No Collection

-DzkHost=192.168.1.2:2181,
192.168.1.1:2181,192.168.1.3:2181

-DzkHost=192.168.1.3:2181,
192.168.1.1:2181,192.168.1.2:2181

Solr Server

ZooKeeper

ZooKeeper

ZooKeeper

Solr Server
Uploading Collection Configuration
./zkcli.sh -cmd upconfig -zkhost 192.168.1.1:2181
-confdir ./conf/ -confname revolution

ZooKeeper

Collection configuration

ZooKeeper

ZooKeeper

Solr
Collections API
Create
Delete

Reload
Split

Create Alias
Delete Alias
Shard Creation/Deletion

http://wiki.apache.org/solr/SolrCloud
Collection Creation
curl 'http://solrhost:8983/solr/admin/collections?action=CREATE
&name=revolution&numShards=3&replicationFactor=4'

name
numShards
replicationFactor
maxShardsPerNode
createNodeSet

collection.configName
Collection Split Example

$ curl
'http://solr1:8983/solr/admin/collections?action=CREATE&
name=collection1&numShards=2&replicationFactor=1'
Collection Split Example

$ curl 'http://localhost:8983/solr/admin/collections?
action=SPLITSHARD&collection=collection1&shard=shard1'
Getting Deeper – CoreAdmin API
curl 'http://solrhost:8983/solr/admin/cores?action=CREATE
&name=newcore&collection=revolution&shard=shard2'

collection
shard

numShards
collection.configName
Schema – the API
Reading (Solr 4.2)
Fields
Dynamic fields
Types
Copy fields
Name (4.3)
Version (4.3)
Unique Key (4.3)
Similarity (4.3)

Writing (Solr 4.4)
Adding new fields
Adding copy fields
Reading Your Schema
curl -XGET 'http://solrhost:8983/solr/rev/schema/fields/name'
{
"responseHeader" : {
"status" : 0,
"QTime" : 5 },
"field" : {
"name" : "name",
"type" : "text_general",
"indexed" : true,
"stored" : true }
}

Full reference: http://wiki.apache.org/solr/SchemaRESTAPI
Dynamic Schema Modifications
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
curl -XPUT 'http://solrhost:8983/solr/rev/schema/fields/content' –d
'{
"type" : "text",
"stored" : "false",
"copyFields" : ["catchAll"]
}'

curl -XPOST 'http://solrhost:8983/solr/rev/schema/copyFields' -d
'[
{
"source" : "name",
"dest" : [ "text", "personal" ]
}
]'
The Right Directory
StandardDirectory
SimpleFSDirectory
NIOFSDirectory
MMapDirectory

_0.fdt

_0.fdx _0.fnm _0.nvd

_1.fdt

_1.fdx _1.fnm _1.nvd

NRTCachingDirectory

RAMDirectory

<directoryFactory name="DirectoryFactory"
class="solr.NRTCachingDirectoryFactory" />
Segment Merging
Level 0

a

b

f

Level 1

c

c

d

e

g
Segment Merge Under Control
Merge policy
Merge scheduler
Merge factor

Merge policy configuration

https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Autocommit or Not?
Automatic data flush (hard commit)
Automatic index view refresh

<autoCommit>
<maxTime>15000</maxTime>
<maxDocs>1000</maxDocs>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
Caches
Refreshed with IndexSearcher
Configurable
Different purposes

Different implementations

Solr Cache
Monitoring Importance
What to Pay Attention to?
Cluster State
Health

Shards and replica status
Shard placement

Failing nodes
Indexing Related Metrics
Index throughput
Document distribution
I/O subsystem metrics
Merging
Search - related Metrics
Count
Latency
Distribution among nodes
Anomalies and spikes
Monitoring Memory and GC
Heap details

Pool size
Pool utilization

Garbage collection count
Garbage collection time
Monitoring OS Related Metrics
CPU details
Load
I/O activity
Network usage
Solr Administration Panel
Solr & JMX
<jmx />
java -Dcom.sun.management.jmxremote –jar start.jar
Solr & JMX
SPM
Index statistics

Request # and latency
Caches and warmup

CPU
JVM Memory and OS Memory
Garbage collector
OS related statistics
SPM Dashboard
Other Monitoring Tools
Ganglia
http://ganglia.sourceforge.net/

New Relic
http://www.newrelic.com/

Opsview
http://www.opsview.com
Too much is too much
Too hot
Caches
We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’re hiring world – wide !
http://sematext.com/about/jobs.html
Thank You !
Rafał  Kuć  
@kucrafal
rafal.kuc@sematext.com

Sematext
@sematext
http://sematext.com
http://blog.sematext.com
SPM discount code:

LR2013SPM20

@ Sematext booth ;)

Administering and Monitoring SolrCloud Clusters