SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
In this talk from Lucene/Solr Revolution 2015, Solr and centralized logging experts Radu Gheorghe and Rafal Kuć cover topics like: flow in Logstash, flow in rsyslog, parsing JSON, log shipping, Solr tuning, time-based collections and tiered clusters.
In this talk from Lucene/Solr Revolution 2015, Solr and centralized logging experts Radu Gheorghe and Rafal Kuć cover topics like: flow in Logstash, flow in rsyslog, parsing JSON, log shipping, Solr tuning, time-based collections and tiered clusters.
32.
32
01
Normalizing “Should Scale”*
sys
tem log
d -ng
performance depends mostly on log length and not on the number of rules:
http://blog.gerhards.net/2013/01/performance-of-liblognormrsyslog-parse.html
41.
41
01
Shipper conclusions
rsyslog
rsyslog
rsyslog
rsyslog
rsyslog
rsyslog
easy setup; flexible
heavy
light; fast
less flexible&easy
offloads buffers and Logstash processing;
flexible and efficient
setup and maintenance overhead
42.
42
01
Solr Tuning Agenda
Schema and config adjustments
Time-based collections
Tiered cluster (e.g. hot vs cold nodes)
43.
43
01
Schema: Two Kinds of Fields
message:failed
"docValues": true
"omitNorms": true,
"omitTermFreqAndPositions": true
44.
44
01
Schema: Two Kinds of Fields
message:failed
"docValues": true
"omitNorms": true,
"omitTermFreqAndPositions": true
+20 to 100% capacity* 10% faster indexing*
* http://blog.sematext.com/2014/11/17/solr-presentations-lucene-solr-revolution/
45.
45
01
Commits
"updateHandler.autoSoftCommit.maxTime": 5000
"updateHandler.autoCommit.maxTime": 60000
<ramBufferSizeMB>200</ramBufferSizeMB>
5s feels near-
realtime while
searching
Flush to disk every
minute or 200MB
46.
46
01
Commits
"updateHandler.autoSoftCommit.maxTime": 5000
"updateHandler.autoCommit.maxTime": 60000
<ramBufferSizeMB>200</ramBufferSizeMB>
5s feels near-
realtime while
searching
Flush to disk every
minute of 200MB
+10% capacity; 10% faster indexing*
47.
47
01
Time-Based Collections
indexing, merges,
most searches
doesn’t change => cache friendly
can be optimized
delete without
triggering merges
48.
48
01
Time-Based Collections
indexing, merges,
most searches
doesn’t change => cache friendly
=> can be optimized
delete without
triggering merges
20-30x capacity; less indexing degradation*
* http://www.slideshare.net/sematext/side-by-side-with-elasticsearch-solr-part-2
57.
57
01
Tiered Cluster
cold1
cold2
cold3
cold4
quick recent searches
and indexing rare lengthy requests
hot1
hot2
buffer for indexing spikes
less shards per collection
and the cluster is still balanced
58.
58
01
Tiered Cluster
cold1
cold2
cold3
cold4
quick recent searches
and indexing rare lengthy requests
hot1
hot2
buffer for indexing spikes
less shards per collection
and the cluster is still balanced
CPU++
RAM++
IO++