SlideShare a Scribd company logo
Tuning Elasticsearch
Indexing Pipeline
for Logs
Radu Gheorghe
Rafał Kuć
Who are we?
Radu Rafał
Logsene
The next hour
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs Logs
Logs
The tools
Logsene
2.0 SNAPSHOT8.9.01.5 RC2
Let the games begin
Logstash
Multiple inputs
Lots of filters
Several outputs
Lots of plugins
How Logstash works
input
(thread per input)
file
tcp
redis
...
filter
(multiple workers)
grok
geoip
...
elasticsearch
solr
...
output
(multiple workers)
Scaling Logstash
Logstash basic
input {
syslog {
port => 13514
}
}
output {
elasticsearch {
protocol => "http”
manage_template => false
index => "test-index”
index_type => "test-type"
}
}
Logstash basic
4K events per second
~130% CPU
utilization
299MB RAM used
Logstash basic
Logstash with mutate
output {
elasticsearch {
protocol => "http”
manage_template => false
index => "test-index”
index_type => "test-type”
flush_size => 1000
workers => 5
}
}
filter {
mutate {
remove_field => [ "severity", "facility", "priority", "@version", "timestamp", "host" ]
}
}
3 filter threads!
-w 3
Logstash with mutate
5K events per second
~250% CPU
utilization
289MB RAM used
Logstash with mutate
Logstash with grok and tcp
filter {
grok {
match => [ "message", "<%{NUMBER:priority}>%{SYSLOGTIMESTAMP:date}
%{DATA:hostname} %{DATA:tag} %{DATA:what}:%{DATA:number}:" ]
}
mutate {
remove_field => [ "message", "@version", "@timestamp", "host" ]
}
}
input {
tcp {
port => 13514
}
}
Logstash with grok and tcp
8K events per second
~310% CPU
utilization
327MB RAM used
Logstash with grok and tcp
Logstash with JSON lines
input {
tcp {
port => 13514
codec => "json_lines"
}
}
Logstash with JSON lines
8K events per second
~260% CPU
utilization
322MB RAM used
Logstash with JSON lines
Rsyslog
Very fast
Very light
How rsyslog works
im*
imfile
imtcp
imjournal
...
mm* om*
mmnormalize
mmjsonparse
...
omelasticsearch
omredis
...
Using rsyslog
Rsyslog basic
module(load="impstats"
interval="10"
resetCounters="on"
log.file="/tmp/stats")
module(load="imtcp")
module(load="omelasticsearch")
input(type="imtcp" port="13514")
action(type="omelasticsearch"
template="plain-syslog"
searchIndex="test-index"
searchType="test-type"
bulkmode="on"
action.resumeretrycount="-1"
)
template(name="plain-syslog"
type="list") {
constant(value="{")
constant(value=""@timestamp":"") property(name="timereported" dateFormat="rfc3339")
constant(value="","host":"") property(name="hostname")
constant(value="","severity":"") property(name="syslogseverity-text")
constant(value="","facility":"") property(name="syslogfacility-text")
constant(value="","syslogtag":"") property(name="syslogtag" format="json")
constant(value="","message":"") property(name="msg" format="json")
constant(value=""}")
}
*http://blog.sematext.com/2015/04/13/monitoring-rsyslogs-performance-with-imstats-and-elasticsearch
Rsyslog basic
6K events per second
~20% CPU utilization
50MB RAM used
Rsyslog basic
Rsyslog queue and workers
main_queue(
queue.size="100000" # capacity of the main queue
queue.dequeuebatchsize="5000" # process messages in batches of 5K
queue.workerthreads="4" # 4 threads for the main queue
)
action(name="send-to-es"
type="omelasticsearch"
template="plain-syslog" # use the template defined earlier
searchIndex="test-index"
searchType="test-type"
bulkmode="on" # use bulk API
action.resumeretrycount="-1" # retry indefinitely if ES is unreachable
)
Rsyslog queue and workers
25K events per
second
~100% CPU
utilization (1 core)
75MB RAM used
(queue dependent)
Rsyslog queue and workers
Rsyslog + mmnormalize
module(load="mmnormalize")
action(type="mmnormalize"
ruleBase="/opt/rsyslog_rulebase.rb"
useRawMsg="on"
)
template(name="lumberjack" type="list") {
property(name="$!all-json")
}
$ cat /opt/rsyslog_rulebase.rb
rule=:<%priority:number%>%date:date-rfc3164% %host:word% %syslogtag:word% %what:char-
to:x3a%:%number:char-to:x3a%:
Rsyslog + mmnormalize
16K events per second
~200% CPU utilization
100MB RAM used
(queue dependent)
Rsyslog + mmnormalize
Rsyslog with JSON parsing
module(load="mmjsonparse")
action(type="mmjsonparse")
Rsyslog with JSON parsing
20K events per
second
~130% CPU utilization
70MB RAM used
(queue dependent)
Rsyslog with JSON parsing
Disk-assisted queues
main_queue(
queue.filename="main_queue" # write to disk if needed
queue.maxdiskspace="5g" # when to stop writing to disk
queue.highwatermark="200000" # start spilling to disk at this size
queue.lowwatermark="100000" # stop spilling when it gets back to this size
queue.saveonshutdown="on" # write queue contents to disk on shutdown
queue.dequeueBatchSize="5000"
queue.workerthreads="4"
queue.size="10000000" # absolute max queue size
)
Elasticsearch
How Elasticsearch works
JSON bulk, single doc
transaction log
inverted index
analysis
primary
transaction log
inverted index
analysis
replica
Elasticsearch
replicate
ES horizontal scaling
Node
shard
ES horizontal scaling
Node
shard
Node
shard
ES horizontal scaling
Node
shard
Node
shard
Node
shard
ES horizontal scaling
Node
shard shard
shard shard
Node
shard shard
shard shard
Node
shard shard
shard shard
ES horizontal scaling
Node
shard shard
shard shard
replica
replica
replica
replica
Node
shard shard
shard shard
replica
replica
replica
replica
Node
shard shard
shard shard
replica
replica
replica
replica
Elasticsearch for tools tests
Nothing is
indexed
No JVM
tuning
Nothing is
stored
_source
disabled
_all
disabled
-1 refresh
30m sync
translog
size: 2g
interval: 30m
Tuning Elasticsearch
refresh_interval: 5s*
doc_values: true
store.throttle.max_bytes_per_sec: 200mb
*http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
Tests: hardware and data
2 x EC2 c3.large instances
(2vCPU, 3.5GB RAM,
2x16GB SSD in RAID0)
vs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs Logs
Logs
Apache logs
Test requests
Filters Aggregations
filter by client IP date histogram
filter by word in user agent top 10 response codes
wildcard filter on domain # of unique IPs
top IPs per response per time
Test runs
1. Write throughput
2. Capacity of a single index
3. Capacity with time-based indices on
hot/cold setup
Write throughput (one index)
Capacity of one index (3200 EPS)
20 seconds @ 40 - 50M
Capacity of one index (400 EPS)
15 seconds @ 40 - 50M
Time-based indices: ideal shard size
smaller indices
lighter indexing
easier to isolate hot data from cold data
easier to relocate
bigger indices
less RAM
less management overhead
smaller cluster state
without indexing, equal latency when dividing
32M data into 1/2/4/8/16/32M indices
Time-based. 2 hot and 2 cold nodes
Before: 3200 After: 4800
Time-based. 2 hot and 2 cold nodes
Before:
15s
After:
5s
That's all folks!
What to remember?
log in
JSON
parallelize
when
possible
use time
based indices
use hot / cold
nodes policy
We are hiring
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with and in open – source?
We’re hiring world – wide!
http://sematext.com/about/jobs.html
Thank you!
Radu Gheorghe
@radu0gheorghe
radu.gheorghe@sematext.com
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com

More Related Content

What's hot

Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs
SmartLogic
 
LogStash in action
LogStash in actionLogStash in action
LogStash in action
Manuj Aggarwal
 
'Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash''Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash'
Cloud Elements
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
琛琳 饶
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
Owen Wu
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performance
Forthscale
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Rafał Kuć
 
Logstash-Elasticsearch-Kibana
Logstash-Elasticsearch-KibanaLogstash-Elasticsearch-Kibana
Logstash-Elasticsearch-Kibana
dknx01
 
Using Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibanaUsing Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibana
Alejandro E Brito Monedero
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
inovex GmbH
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
 
Large Scale Log collection using LogStash & mongoDB
Large Scale Log collection using LogStash & mongoDB Large Scale Log collection using LogStash & mongoDB
Large Scale Log collection using LogStash & mongoDB
Gaurav Bhardwaj
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Publicis Sapient Engineering
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
YoungHeon (Roy) Kim
 
{{more}} Kibana4
{{more}} Kibana4{{more}} Kibana4
{{more}} Kibana4
琛琳 饶
 
Elk stack
Elk stackElk stack
Elk stack
Jilles van Gurp
 
ELK Stack
ELK StackELK Stack
ELK Stack
Phuc Nguyen
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
hypto
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
Treasure Data, Inc.
 

What's hot (20)

Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs
 
LogStash in action
LogStash in actionLogStash in action
LogStash in action
 
'Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash''Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash'
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performance
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
 
Logstash-Elasticsearch-Kibana
Logstash-Elasticsearch-KibanaLogstash-Elasticsearch-Kibana
Logstash-Elasticsearch-Kibana
 
Using Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibanaUsing Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibana
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Large Scale Log collection using LogStash & mongoDB
Large Scale Log collection using LogStash & mongoDB Large Scale Log collection using LogStash & mongoDB
Large Scale Log collection using LogStash & mongoDB
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
{{more}} Kibana4
{{more}} Kibana4{{more}} Kibana4
{{more}} Kibana4
 
Elk stack
Elk stackElk stack
Elk stack
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 

Similar to Tuning Elasticsearch Indexing Pipeline for Logs

(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석
INSIGHT FORENSIC
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
Renzo Tomà
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
Tim Bunce
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
Ihor Bobak
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Spark Summit
 
Proving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slobProving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slob
Kapil Goyal
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
Prajal Kulkarni
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Lucidworks
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
Rails Performance
Rails PerformanceRails Performance
Rails Performance
Wen-Tien Chang
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
Sematext Group, Inc.
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
MySQLConference
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
Nitin Kumar
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
Vadim Semenov
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
Vladimir Pavkin
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
Prajal Kulkarni
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 

Similar to Tuning Elasticsearch Indexing Pipeline for Logs (20)

(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Proving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slobProving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slob
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Rails Performance
Rails PerformanceRails Performance
Rails Performance
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 

More from Sematext Group, Inc.

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
Sematext Group, Inc.
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
Sematext Group, Inc.
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
Sematext Group, Inc.
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
Sematext Group, Inc.
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
Sematext Group, Inc.
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Sematext Group, Inc.
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Sematext Group, Inc.
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
Sematext Group, Inc.
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Sematext Group, Inc.
 
(Elastic)search in big data
(Elastic)search in big data(Elastic)search in big data
(Elastic)search in big data
Sematext Group, Inc.
 

More from Sematext Group, Inc. (20)

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
 
(Elastic)search in big data
(Elastic)search in big data(Elastic)search in big data
(Elastic)search in big data
 

Recently uploaded

如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
Envertis Software Solutions
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLESINTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES
anfaltahir1010
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 

Recently uploaded (20)

如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLESINTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLES
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 

Tuning Elasticsearch Indexing Pipeline for Logs

Editor's Notes

  1. Rafał starts and passes mic to Radu
  2. Rafal slide – describe the talk brefily !!! Ask people how many of the audience used the tools
  3. Radu slide we did some tests, we’ll share configs and benchmarks – here are the versions Logstash 1.5 – the final version will be up soon Rsyslog 8.9 – the current stable (note: most distros come with 5.x or 7.x) ES is a search engine based on Apache Lucene Current version is 1.5, next major is 2.0 with lots of changes. Many related to Lucene 5.0 Not the only tools for logging, there are many other tools, both open source and commercial, that can receive logs, parse them, buffer them and index them
  4. Rafal slide
  5. Rafal slide * Ask how many people know about Logstash
  6. Rafal slide
  7. Rafal slide
  8. Radu Assume we want to centralize syslog Forward syslog via TCP/UDP on a port to Logstash On the Logstash side, you can use the TCP input to listen to that port and parse syslog messages You’d use the ES output to forward to ES you can use a Java binary, but HTTP is better Logstash comes with a template for ES index, but for perf tests we’ll use our own Specify where (index,type – like a DB and a table)
  9. Radu - 1.3 CPUs
  10. Radu – segue to tuning, pass the mic
  11. Rafal Flush size – 1000 lowered from default 5000
  12. Rafal
  13. Rafal
  14. Rafal Syslog is just TCP + Grok We changed that and we are not parsing the syslog format exactly – we wanted to parse additional things and wanted to show how to parse unstructured data
  15. The bound was: - hardware (high CPU usage) - JSON lines codec is not parallelized, while GROK is - But if you want to do your homework you can do another run with JSON filter instead of codec and that will give the possibility of parallelization
  16. Radu Many people hate it, maybe because of docs I like it because it’s light and fast and has surprisingly rich functionality
  17. Like Logstash, it’s modular, you can use inputs to get data in, message modifiers to parse data and outputs to pass it on The flow of data is a bit different Inputs may have multiple threads, and they write to a main queue On the main queue, worker threads can do filtering, format messages using templates (will talk later) and run actions (parsing/output) You can have action queues as well, with their own threads => async You can have rulesets, which let you separate flows of input – parse – output (e.g. One ruleset for local logs, one for remote logs)
  18. Typical setup is to have it on each server, push to ES directly, buffer if necessary
  19. Load modules Impstats is for monitoring, then tcp and ES Start the tcp listener Template – how the JSON that we send to ES will look like Action – send to ES, using the template, specify index/type, use bulks, retry on failure
  20. Bigger memory buffer Increase bulk size Moar worker threads
  21. Not using more because ES is using the rest – Rafal will talk about that in a bit RAM has increased because of the queue size
  22. Clear win
  23. But not really apples for apples, because rsyslog has dedicated syslog parsers Still, not only for syslog, can parse unstructured data via mmnormalize Refer to a rulebase, which looks much like grok patterns, with two differences: Normally, patterns like number or date aren’t regexes but specific parsers. Faster but less flexible. The one above is equivalent to the Logstash grok seen earlier Builds a parse tree on startup, helps with speed if you have many rules
  24. Radu
  25. More throughput with less CPU usage
  26. Before moving on, one more thing: in production you probably want to use disk assisted queues instead of in-memory queues like the ones we had here. DA queue is in-memory queue that can spill to disk. Specify that via file name and give it a threshold Spilling is smart: Normally in memory When it reaches high watermark it starts writing to disk, but it does so in batches, so resumes to memory when lowwatermark Side-benefit: can save and reload memory queue contents when restarting rsyslog
  27. Rafal
  28. Rafał Index a document It goes to ES first to transaction log next to inverted index It is replicated on transaction log level
  29. Rafał
  30. Rafał
  31. Rafał
  32. Rafał
  33. Rafał
  34. Rafał
  35. Rafał Throttling – the default is 20, we are using 200, so we are actually going for 10 times more (we are usind SSD drives here)
  36. Rafał
  37. Rafał Cheaper filters and aggregations are on top The more expensive are at the bottom
  38. Radu Index as fast as we can How much data we can put in a single index at a decent indexing rate before searches took too long a good practice is to have time-based indices (e.g. Keep logs for a week, have one per day). We want to benchmark that + separating indexing load from search load by putting today’s index on different nodes than the „old” ones
  39. Rafal Rate slowly goes down, because merges happen and because the index is slowly getting bigger
  40. Rafał 40-50 m @ 20 seconds Most expensive query takes 20 sec on average Filters (quick ones) takes subseconds Some aggs takes up to 5 seconds on average
  41. Rafał Spikes because of merges, the big spike is because the merge happen and after the merge the queries are actually faster Most expensive queries take 15 seconds
  42. Radu Want to benchmark TB indices. Because: Indexing is better because of merging Searching recent data is better because idx is smaller Deleting entired indices is better But what granularity? Use-cases for small (high indexing, small retention, CPU contraint) vs big (low idx, high retention, mem constraint) granularity doesn’t affect cold search perf
  43. Rafal Tell about hot and cold setup The drop is because cold nodes were full
  44. Rafal
  45. Radu