Max. shear stress theory-Maximum Shear Stress Theory Maximum Distortional ...
Leveraging pull replicas in Solr 7
1. Leveraging pull replicas in
Solr 7
+ Metric reporters in Solr 7
Samuel Tatipamula, Search team at
2. Our setup and use cases
SolrCloud model with nodes distributed over multiple clusters.
Indexing intensive - typical e-commerce use case with real time update events.
Ability to process the high indexing throughput is critical.
Recently upgraded to Solr 7.
3. Previously in Solr..
Solr 6 and before, all the nodes in SolrCloud are nrt.
Every shard replica gets /update requests and maintains tlog for failover and recovery.
Every node is eligible to become the leader.
5. Limitations we have faced
Latency for /update requests.
High network bandwidth consumption on leader - it used to hit 1 GBPS limit frequently in
case of heavy updates.
Replicas’ CPU utilization under heavy query load.
6. One of the workarounds for the network limitation was to distribute nodes into multiple
small clusters.
This means a lot of manual maintenance to make sure the schemas and configs are in sync
across clusters.
Also, the pipeline has to send /update requests to multiple clusters, and need to make sure
that the indices are in sync.
Limitations we have faced
7. New in Solr 7
Along with nrt, 2 new replica types are introduced for SolrCloud - tlog and pull.
A tlog replica gets /update requests from leader, maintains tlog for failover and recovery,
and it can become the leader.
A pull replica doesn’t get /update requests, doesn’t maintain tlog, and can’t become the
leader.
Both tlog and pull replicas fetch compressed index segments from leader.
8. Updates in Solr 7 with tlog and pull replicas
Indexing
pipeline
(sync)
/update request {}
leader
PULLTLOG PULL
9. New in Solr 7
tlog and pull replicas fetch new index segments (if any) from leader by polling periodically.
This interval is, by default, half of the hard commit time.
10. SolrCloud setup with tlog and pull
Leader is always nrt.
A couple of tlog replicas for failover and consistency.
Remaining all nodes can be pull replicas, as they are ideal for query performance.
11. Observations
Since replicas pull compressed index segments, and at different times, network bandwidth
utilisation reduced by ~30%.
/update response times decreased significantly - more updates can be processed in a given
time.
CPU utilisation on pull replicas is relatively less compared to previous nrt setup.
Recovery times for pull replicas are remarkably less.
12. Observations
Since the index updates to pull and tlog replicas are asynchronous, and happens after hard
commit, visibility of updates is impacted.
In case of replication failure to pull replicas, they keep serving reads with stale data.
In case of replication failure to pull replicas, as of now, Zookeeper doesn’t mark them as
down.
14. Previously..
For monitoring and performance tuning, we would’ve to hit the metrics API.
JMX reporting could be used, and 3rd party monitoring tools, like sematext, worked on
JMX.
There was no seamless way to get metrics into existing Graphite or StatsD servers.
15. New metric reporters in Solr 7
Metrics maintained internally in the following registries..
- core - core specific info; replication, query etc..
- node - admin request handles, no. of cores etc..
- jetty - threads, pools, http requests etc..
- jvm - system memory, heap usage, CPU, GC etc..
16. New metric reporters in Solr 7
Uses DropWizard metrics API.
Readily available reporters
- JMX
- Graphite
- SLF4J
- Ganglia
We use StatsD, so we have created StatsD reporter
(based on ReadyTalk/metrics-statsd)
17. Configuring metric reporters
In your solr.xml, add the reporter and the registries needed..
<metrics>
<reporter name="graphite" group="node, jvm"
class="org.apache.solr.metrics.reporters.SolrGraphiteReporter">
<str name="host">graphite-server</str>
<int name="port">9999</int>
<int name="period">60</int>
<str name="prefix">solr-</str>
</reporter>
</metrics>