Solr Metrics - Andrzej Białecki, Lucidworks

Solr metrics
Andrzej Białecki
Lucidworks

2
whoami
•  Lucene / Solr user, contributor, commi2er
•  Author of Luke – The Index Toolbox
•  Lucidworks Fusion developer

3
Agenda
•  MoAvaAon – why Solr needs metrics
•  Design – how Solr collects metrics and what data is being collected
•  ImplementaAon – key components
•  ConﬁguraAon – what to collect and how to report it
•  Examples of metrics and integraAons with external systems
•  Future development

5
Why metrics?
•  DevOps need tools for monitoring the system behavior
•  ProducAon troubleshooAng, eg. FD leaks, outlier requests
-  Proﬁling is not an opAon in producAon deployments
•  OpAmizaAon of the system on various levels
-  OS, collecAon, node, shard, doc rouAng, …
•  Especially useful in locked-‐down deployments

6
JIRA
•  Long-‐standing request – ﬁrst created in 2013!
•  Many contributors (and watchers!)
•  “metrics” JIRA component
-  Containing now approx. 60 issues
•  Key JIRAs:
-  SOLR-‐4735 iniAal framework
-  SOLR-‐9812 /admin/metrics handler
•  First released in Solr 6.4.0
-  With important bug ﬁxes released in 6.4.2

8
Dropwizard Metrics
•  High-‐performance lightweight metrics framework
•  Metric types
-  counter: monotonically increasing counter
-  number of processed docs
-  meter: counter + moving average (rate), 1-‐, 5-‐ and 15-‐minute
-  system load average, rate of requests
-  histogram: histogram of values (exponenAally decaying by default)
-  result sizes, IO read sizes
-  Amer: meter and histogram of event duraAons
-  commit Ames, query Ames, request Ames
-  gauge: instantaneous reading of a value
-  current heap size, number of cores, TLOG buﬀer size

9
Where the data is collected from?
•  JVM metrics
-  GC, heap, threads, class loading, OS load / mem / FDs, etc
•  Je2y / HTTP metrics
-  connecAons, thread pools, …
•  Container metrics
-  number of cores, data paths, admin handler metrics
•  Per-‐SolrCore metrics
-  All RequestHandler-‐s: request counters and Amers
-  Searcher and cache stats
-  ReplicaAon
-  Index-‐level Amers and histograms
-  Other components
•  SolrCloud metrics (opAonal)
-  Aggregated from SolrCloud nodes
JVM

Je&y
/
HTTP

CoreContainer

SolrCore

…

Components

Solr instance

10
Registries
•  Metric groups for each major aspect of a Solr instance:
-  jvm, jetty, node (CoreContainer), core (SolrCore)
-  see SolrInfoBean.Group
•  One registry per group, and one for each SolrCore
-  Easier to manage core metrics throughout core life-‐cycle
•  No persistence across node restarts
-  SolrCore metrics persist across core reloads
solr.jvm

solr.node

solr.core.collec=on1

11
Registry names
•  Hierarchical, dot-‐separated
•  Always preﬁxed with solr.
•  Overridable using System properAes:
-Dsolr.core.collection1=solr.myCollection
-  This is useful eg. to collapse per-‐replica registries into one registry with aggregated
metrics
•  SolrCloud “core” registry name example:

SolrCore name: collection1_shard1_replica_n3
Registry name: solr.core.collection1.shard1.replica_n3
solr.jvm

solr.node


12
Metric names
•  Hierarchical dot-‐separated
•  By convenAon names start with component category
-  eg. CONTAINER, CORE, QUERY …
-  see SolrInfoBean.Category
•  Request handler metrics follow this naming:
<category> . <handler name or scope> . <metric name>
•  Examples:
QUERY./select.requestTimes
UPDATE.updateShardHandler.threadPool.recoveryExecutor.completed
solr.jvm

solr.node


CORE.fs.totalSpace

SEARCHER.new

QUERY./get.requests

CACHE.core.ﬁeldCache

13
Metric properties
•  Simple numeric / string value, or nested JSON maps
-  Numeric counter: QUERY./select.requests
-  Timer: QUERY./select.requestTimes
-  ProperAes: count, meanRate, 1minRate, 5minRate, 15minRate, min_ms, max_ms,
mean_ms, p75_ms, …
-  A data structure: CACHE.core.fieldCache
-  ProperAes: total_size, entries_count, entry#0, entry#1, …
-  Arbitrary map: system.properties
•  It’s possible to retrieve only selected properAes via /
admin/metrics handler
-  Eg. key=solr.jvm:system.properties:user.name
solr.jvm

solr.node


CORE.fs.totalSpace

SEARCHER.new

QUERY./get.requests

CACHE.core.ﬁeldCache

total_size

entries_count

entry#0

15
Components
•  SolrMetricManager
-  One central component to manage registries and reporters
•  MetricRegistry
(Dropwizard API)
-  Type-‐safe Map keeping related metric instances and their names
•  SolrMetricProducer (interface)
-  Creates and registers metric instances
-  Many exisAng Solr components now implement this interface
•  SolrMetricReporter (abstract class)
-  Reports collected metrics to external agents and/or ﬁles
-  Several implementaAons available out of the box
•  MetricsHandler (at /admin/metrics)
-  Provides access to all local metric registries

16
SolrMetricManager

Solr
instance




CoreContainer

SolrCore1

SolrCore2

SolrCore3

solr.je&y

SolrMetricReporter
/admin/metrics

Ganglia

Graphite

SLF4j

JMX

solr.node

UI and other reporAng tools
solr.jvm

17
/admin/metrics handler
•  Shows metrics from all or selected registries
•  Flexible selecAon criteria:
-  registry by: group (e.g. jetty, node), or registry name (e.g.
solr.core.collection1)
-  filter metrics by a list of prefixes (or regexes)
-  retrieve only some properAes using property parameter
-  Retrieve single metrics / properAes using fully-‐qualified key (7.1)

18
Example
/admin/metrics
http://localhost:8983
/solr/admin/metrics
?group=core
&prefix=SEARCHER

19
Example
/admin/metrics
http://localhost:8983
/solr/admin/metrics?group=core
&regex=QUERY./select.*Times
&property=max_ms

20
Example /admin/metrics http://localhost:8983
/solr/admin/metrics
?group=node
&prefix=CONTAINER

21
Metrics vs. Solr 6.x MBeans
•  Naming of categories and groups has slightly changed
-  More ﬁqng categories for some components
•  Solr 6.x sAll uses independent implementaAons for Metrics and MBeans
-  Some staAsAcs are either unavailable in each API or reported diﬀerently
•  Solr 7.x uses only Metrics API to report MBean stats
-  This includes also nested and non-‐numeric values
-  <jmx> element in solrconfig.xml is no longer supported – instead use
SolrJMXReporter in solr.xml
-  AutomaAcally added if missing and when an MBeanServer is detected

24
Metrics collection
•  Already happening J
-  Minimal overhead, in the order of μs/req and < 0.5 MB / core
•  New secAon in solr.xml: <solr><metrics>
-  Reporter conﬁguraAon
-  Custom metric implementaAons
-  Some debug conﬁguraAon
-  Detailed histograms of index and TLOG processing Ames, per core

25
Reporters
•  Extend SolrMetricReporter
•  Conﬁgured in solr.xml <solr><metrics><reporter>
•  Several implementaAons provided:
-  JMX: fully hierarchical view in e.g. JConsole
-  Ganglia, Graphite, SLF4J: send periodic reports of selected metrics
-  Easy API – create new ones!
-  h2ps://github.com/vthacker/solr-‐metrics-‐inﬂuxdb
•  Created for each selected registry, using group and/or registry list a2ributes
-  If neither is present the reporter is created for all registries

26
Reporter configuration details
•  Required a2ributes: name (unique per registry), class (FQCN)
•  OpAonal a2ributes:
-  group – comma-‐separated list of registry groups, eg. core,jvm
-  registry – comma-‐separated list of registry prefixes, eg.
solr.node,solr.core.coll
•  OpAonal initArgs:
-  filter -‐ report only metrics with that prefix, e.g. QUERY./select
-  period -‐ how oven metrics will be reported, in seconds
-  ... other, depending on implementaAon, e.g. logger name for SLF4j
* NOTE: for a given configuraAon, separate reporter instances are created for each matching registry

27
solr.xml
<solr>
...
<metrics>
<reporter name="global"
class="org.apache.solr.metrics.reporters.SolrJmxReporter"/>
<reporter name="perCore" group="core"
class="org.apache.solr.metrics.reporters.SolrSlf4jReporter”>
<int name=“period”>60</int>
<str name=“logger”>metricsLogger</str>
</reporter>
</metrics>
</solr>

28
Advanced configuration
•  LimitaAons of default histogram / Amer in Dropwizard Metrics
-  Uses ExponenAallyDecayingReservoir (EDR) sampling BUT assumes
normal distribuAon
-  If distribuAon is skewed then rare outliers may never be captured or retained long enough
-  EDR is tuned to prefer last 5 minutes of data – but keeps only 1028
random samples
-  May LOSE criAcal min / max / percenAle data under higher rate of
updates
-  May report obsolete values to snapshot because retained data is
replaced randomly
-  Internal values are “decayed” only during updates – no updates means
values are stuck!

≠

29
Advanced configuration
•  Custom parameters and implementaAons for metrics
-  <solr><metrics><suppliers> secAon in solr.xml
-  Users can provide their own implementaAons of counters, meters, Amers and
histograms
•  Solving the issue with EDR
-  Use different reservoir size, or different reservoir implementaAon
-  Several other implementaAons available, with tradeoffs, eg. SlidingTimeWindowReservoir
-  Use your own histogram implementaAons
-  h2p://github.com/vladimir-‐bukhtoyarov/rolling-‐metrics

* NOTE: metric reporters retrieve metric snapshots concurrently and at arbitrary :mes,
DO NOT use implementa:ons that reset to 0 a>er each snapshot!

30
Example advanced configuration (solr.xml)
•  Diﬀerent reservoir implementaAon:
<solr>
<metrics>
<suppliers>
<histogram>
<int name=“window”>300</int>
<str name=“reservoir”>com.codahale.metrics.SlidingTimeWindowReservoir</str>
</histogram>
</suppliers>
</metrics>
</solr>

31
SolrCloud metrics (7.x)
•  Shard metrics
-  Reported from replicas to shard leaders
•  Node metrics
-  Reported from mulAple registries on each node to Overseer
•  ParAally aggregated (simple sum, avg, mean, stddev, string lists)
-  Some aggregaAons wouldn’t make sense, eg. Histograms
•  AutomaAcally collected by /metrics/collector handler
•  Conﬁgured in solr.xml using special shard and cluster groups

34
Example JConsole
view in 6.x

39
Metrics in 7.x
•  Adding more conﬁgurability
•  Be2er defaults for reservoirs
•  Autoscaling framework
-  Autoscaling acAons are largely based on metrics, eg.
-  freedisk, sysLoadAvg, cores, heapUsage, system properAes
-  May use any metric value eg. metrics:solr.node:CONTAINER.fs.usableSpace
•  Using metrics for feedback control in Solr clusters
-  Support for modeling and simulaAon of dynamic behavior (SOLR-‐11285)

40
Summary
•  Metrics are a lightweight mechanism for collecAng detailed insights into Solr
operaAon
-  Provided now by most Solr components
-  Easy to add new metrics
•  Metrics can be reported to external systems in mulAple formats and protocols
-  Several popular systems already supported
-  Easy to add new reporters
•  Metrics provide key data for SolrCloud autoscaling
•  How do you want to use metrics?

Solr Metrics - Andrzej Białecki, Lucidworks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Solr Metrics - Andrzej Białecki, Lucidworks

Similar to Solr Metrics - Andrzej Białecki, Lucidworks (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Solr Metrics - Andrzej Białecki, Lucidworks