Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tuning Elasticsearch
Indexing Pipeline
for Logs
Radu Gheorghe
Rafał Kuć
Who are we?
Radu Rafał
Logsene
The next hour
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs Logs
Logs
The tools
Logsene
2.0 SNAPSHOT8.9.01.5 RC2
Let the games begin
Logstash
Multiple inputs
Lots of filters
Several outputs
Lots of plugins
How Logstash works
input
(thread per input)
file
tcp
redis
...
filter
(multiple workers)
grok
geoip
...
elasticsearch
solr...
Scaling Logstash
Logstash basic
input {
syslog {
port => 13514
}
}
output {
elasticsearch {
protocol => "http”
manage_template => false
ind...
Logstash basic
4K events per second
~130% CPU
utilization
299MB RAM used
Logstash basic
Logstash with mutate
output {
elasticsearch {
protocol => "http”
manage_template => false
index => "test-index”
index_type...
Logstash with mutate
5K events per second
~250% CPU
utilization
289MB RAM used
Logstash with mutate
Logstash with grok and tcp
filter {
grok {
match => [ "message", "<%{NUMBER:priority}>%{SYSLOGTIMESTAMP:date}
%{DATA:hostn...
Logstash with grok and tcp
8K events per second
~310% CPU
utilization
327MB RAM used
Logstash with grok and tcp
Logstash with JSON lines
input {
tcp {
port => 13514
codec => "json_lines"
}
}
Logstash with JSON lines
8K events per second
~260% CPU
utilization
322MB RAM used
Logstash with JSON lines
Rsyslog
Very fast
Very light
How rsyslog works
im*
imfile
imtcp
imjournal
...
mm* om*
mmnormalize
mmjsonparse
...
omelasticsearch
omredis
...
Using rsyslog
Rsyslog basic
module(load="impstats"
interval="10"
resetCounters="on"
log.file="/tmp/stats")
module(load="imtcp")
module(l...
Rsyslog basic
6K events per second
~20% CPU utilization
50MB RAM used
Rsyslog basic
Rsyslog queue and workers
main_queue(
queue.size="100000" # capacity of the main queue
queue.dequeuebatchsize="5000" # pro...
Rsyslog queue and workers
25K events per
second
~100% CPU
utilization (1 core)
75MB RAM used
(queue dependent)
Rsyslog queue and workers
Rsyslog + mmnormalize
module(load="mmnormalize")
action(type="mmnormalize"
ruleBase="/opt/rsyslog_rulebase.rb"
useRawMsg="...
Rsyslog + mmnormalize
16K events per second
~200% CPU utilization
100MB RAM used
(queue dependent)
Rsyslog + mmnormalize
Rsyslog with JSON parsing
module(load="mmjsonparse")
action(type="mmjsonparse")
Rsyslog with JSON parsing
20K events per
second
~130% CPU utilization
70MB RAM used
(queue dependent)
Rsyslog with JSON parsing
Disk-assisted queues
main_queue(
queue.filename="main_queue" # write to disk if needed
queue.maxdiskspace="5g" # when to s...
Elasticsearch
How Elasticsearch works
JSON bulk, single doc
transaction log
inverted index
analysis
primary
transaction log
inverted ind...
ES horizontal scaling
Node
shard
ES horizontal scaling
Node
shard
Node
shard
ES horizontal scaling
Node
shard
Node
shard
Node
shard
ES horizontal scaling
Node
shard shard
shard shard
Node
shard shard
shard shard
Node
shard shard
shard shard
ES horizontal scaling
Node
shard shard
shard shard
replica
replica
replica
replica
Node
shard shard
shard shard
replica
re...
Elasticsearch for tools tests
Nothing is
indexed
No JVM
tuning
Nothing is
stored
_source
disabled
_all
disabled
-1 refresh...
Tuning Elasticsearch
refresh_interval: 5s*
doc_values: true
store.throttle.max_bytes_per_sec: 200mb
*http://blog.sematext....
Tests: hardware and data
2 x EC2 c3.large instances
(2vCPU, 3.5GB RAM,
2x16GB SSD in RAID0)
vs
Logs
Logs
Logs
Logs
Logs
Lo...
Test requests
Filters Aggregations
filter by client IP date histogram
filter by word in user agent top 10 response codes
w...
Test runs
1. Write throughput
2. Capacity of a single index
3. Capacity with time-based indices on
hot/cold setup
Write throughput (one index)
Capacity of one index (3200 EPS)
20 seconds @ 40 - 50M
Capacity of one index (400 EPS)
15 seconds @ 40 - 50M
Time-based indices: ideal shard size
smaller indices
lighter indexing
easier to isolate hot data from cold data
easier to ...
Time-based. 2 hot and 2 cold nodes
Before: 3200 After: 4800
Time-based. 2 hot and 2 cold nodes
Before:
15s
After:
5s
That's all folks!
What to remember?
log in
JSON
parallelize
when
possible
use time
based indices
use hot / cold
nodes policy
We are hiring
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with and in open – source...
Thank you!
Radu Gheorghe
@radu0gheorghe
radu.gheorghe@sematext.com
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@se...
Upcoming SlideShare
Loading in …5
×

Tuning Elasticsearch Indexing Pipeline for Logs

22,533 views

Published on

How to tune Logstash and rsyslog, as well as Elasticsearch itself for indexing logs.

Published in: Software
  • Be the first to comment

Tuning Elasticsearch Indexing Pipeline for Logs

  1. 1. Tuning Elasticsearch Indexing Pipeline for Logs Radu Gheorghe Rafał Kuć
  2. 2. Who are we? Radu Rafał Logsene
  3. 3. The next hour Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs
  4. 4. The tools Logsene 2.0 SNAPSHOT8.9.01.5 RC2
  5. 5. Let the games begin
  6. 6. Logstash Multiple inputs Lots of filters Several outputs Lots of plugins
  7. 7. How Logstash works input (thread per input) file tcp redis ... filter (multiple workers) grok geoip ... elasticsearch solr ... output (multiple workers)
  8. 8. Scaling Logstash
  9. 9. Logstash basic input { syslog { port => 13514 } } output { elasticsearch { protocol => "http” manage_template => false index => "test-index” index_type => "test-type" } }
  10. 10. Logstash basic 4K events per second ~130% CPU utilization 299MB RAM used
  11. 11. Logstash basic
  12. 12. Logstash with mutate output { elasticsearch { protocol => "http” manage_template => false index => "test-index” index_type => "test-type” flush_size => 1000 workers => 5 } } filter { mutate { remove_field => [ "severity", "facility", "priority", "@version", "timestamp", "host" ] } } 3 filter threads! -w 3
  13. 13. Logstash with mutate 5K events per second ~250% CPU utilization 289MB RAM used
  14. 14. Logstash with mutate
  15. 15. Logstash with grok and tcp filter { grok { match => [ "message", "<%{NUMBER:priority}>%{SYSLOGTIMESTAMP:date} %{DATA:hostname} %{DATA:tag} %{DATA:what}:%{DATA:number}:" ] } mutate { remove_field => [ "message", "@version", "@timestamp", "host" ] } } input { tcp { port => 13514 } }
  16. 16. Logstash with grok and tcp 8K events per second ~310% CPU utilization 327MB RAM used
  17. 17. Logstash with grok and tcp
  18. 18. Logstash with JSON lines input { tcp { port => 13514 codec => "json_lines" } }
  19. 19. Logstash with JSON lines 8K events per second ~260% CPU utilization 322MB RAM used
  20. 20. Logstash with JSON lines
  21. 21. Rsyslog Very fast Very light
  22. 22. How rsyslog works im* imfile imtcp imjournal ... mm* om* mmnormalize mmjsonparse ... omelasticsearch omredis ...
  23. 23. Using rsyslog
  24. 24. Rsyslog basic module(load="impstats" interval="10" resetCounters="on" log.file="/tmp/stats") module(load="imtcp") module(load="omelasticsearch") input(type="imtcp" port="13514") action(type="omelasticsearch" template="plain-syslog" searchIndex="test-index" searchType="test-type" bulkmode="on" action.resumeretrycount="-1" ) template(name="plain-syslog" type="list") { constant(value="{") constant(value=""@timestamp":"") property(name="timereported" dateFormat="rfc3339") constant(value="","host":"") property(name="hostname") constant(value="","severity":"") property(name="syslogseverity-text") constant(value="","facility":"") property(name="syslogfacility-text") constant(value="","syslogtag":"") property(name="syslogtag" format="json") constant(value="","message":"") property(name="msg" format="json") constant(value=""}") } *http://blog.sematext.com/2015/04/13/monitoring-rsyslogs-performance-with-imstats-and-elasticsearch
  25. 25. Rsyslog basic 6K events per second ~20% CPU utilization 50MB RAM used
  26. 26. Rsyslog basic
  27. 27. Rsyslog queue and workers main_queue( queue.size="100000" # capacity of the main queue queue.dequeuebatchsize="5000" # process messages in batches of 5K queue.workerthreads="4" # 4 threads for the main queue ) action(name="send-to-es" type="omelasticsearch" template="plain-syslog" # use the template defined earlier searchIndex="test-index" searchType="test-type" bulkmode="on" # use bulk API action.resumeretrycount="-1" # retry indefinitely if ES is unreachable )
  28. 28. Rsyslog queue and workers 25K events per second ~100% CPU utilization (1 core) 75MB RAM used (queue dependent)
  29. 29. Rsyslog queue and workers
  30. 30. Rsyslog + mmnormalize module(load="mmnormalize") action(type="mmnormalize" ruleBase="/opt/rsyslog_rulebase.rb" useRawMsg="on" ) template(name="lumberjack" type="list") { property(name="$!all-json") } $ cat /opt/rsyslog_rulebase.rb rule=:<%priority:number%>%date:date-rfc3164% %host:word% %syslogtag:word% %what:char- to:x3a%:%number:char-to:x3a%:
  31. 31. Rsyslog + mmnormalize 16K events per second ~200% CPU utilization 100MB RAM used (queue dependent)
  32. 32. Rsyslog + mmnormalize
  33. 33. Rsyslog with JSON parsing module(load="mmjsonparse") action(type="mmjsonparse")
  34. 34. Rsyslog with JSON parsing 20K events per second ~130% CPU utilization 70MB RAM used (queue dependent)
  35. 35. Rsyslog with JSON parsing
  36. 36. Disk-assisted queues main_queue( queue.filename="main_queue" # write to disk if needed queue.maxdiskspace="5g" # when to stop writing to disk queue.highwatermark="200000" # start spilling to disk at this size queue.lowwatermark="100000" # stop spilling when it gets back to this size queue.saveonshutdown="on" # write queue contents to disk on shutdown queue.dequeueBatchSize="5000" queue.workerthreads="4" queue.size="10000000" # absolute max queue size )
  37. 37. Elasticsearch
  38. 38. How Elasticsearch works JSON bulk, single doc transaction log inverted index analysis primary transaction log inverted index analysis replica Elasticsearch replicate
  39. 39. ES horizontal scaling Node shard
  40. 40. ES horizontal scaling Node shard Node shard
  41. 41. ES horizontal scaling Node shard Node shard Node shard
  42. 42. ES horizontal scaling Node shard shard shard shard Node shard shard shard shard Node shard shard shard shard
  43. 43. ES horizontal scaling Node shard shard shard shard replica replica replica replica Node shard shard shard shard replica replica replica replica Node shard shard shard shard replica replica replica replica
  44. 44. Elasticsearch for tools tests Nothing is indexed No JVM tuning Nothing is stored _source disabled _all disabled -1 refresh 30m sync translog size: 2g interval: 30m
  45. 45. Tuning Elasticsearch refresh_interval: 5s* doc_values: true store.throttle.max_bytes_per_sec: 200mb *http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
  46. 46. Tests: hardware and data 2 x EC2 c3.large instances (2vCPU, 3.5GB RAM, 2x16GB SSD in RAID0) vs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Logs Apache logs
  47. 47. Test requests Filters Aggregations filter by client IP date histogram filter by word in user agent top 10 response codes wildcard filter on domain # of unique IPs top IPs per response per time
  48. 48. Test runs 1. Write throughput 2. Capacity of a single index 3. Capacity with time-based indices on hot/cold setup
  49. 49. Write throughput (one index)
  50. 50. Capacity of one index (3200 EPS) 20 seconds @ 40 - 50M
  51. 51. Capacity of one index (400 EPS) 15 seconds @ 40 - 50M
  52. 52. Time-based indices: ideal shard size smaller indices lighter indexing easier to isolate hot data from cold data easier to relocate bigger indices less RAM less management overhead smaller cluster state without indexing, equal latency when dividing 32M data into 1/2/4/8/16/32M indices
  53. 53. Time-based. 2 hot and 2 cold nodes Before: 3200 After: 4800
  54. 54. Time-based. 2 hot and 2 cold nodes Before: 15s After: 5s
  55. 55. That's all folks!
  56. 56. What to remember? log in JSON parallelize when possible use time based indices use hot / cold nodes policy
  57. 57. We are hiring Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide! http://sematext.com/about/jobs.html
  58. 58. Thank you! Radu Gheorghe @radu0gheorghe radu.gheorghe@sematext.com Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com

×