SlideShare a Scribd company logo
1 of 41
Download to read offline
Tuning Solr for Logs
Radu Gheorghe
@radu0gheorghe @sematext
/me does...
Logsenesearch consulting + = logging consulting
.com/logsene
Tuning. Is it worth it?
baseline last run
# of logs 10M 310M
EC2 bill/month 700 450
What to optimize for?
http://www.seasonslogs.co.uk/images/products/SL_001.png
https://openclipart.org/image/300px/svg_to_png/169833/Server_1U.png
capacity: how many logs
the same hardware can keep
while still providing decent
performance
What's decent performance? “It depends”
Assumptions
indexing: enough to keep up with generated logs*
search concurrency
search latency: 2s for debug queries, 5s for charts
*account for spikes!
Enough theory, let's start testing!
Solr instance
m3.2xlarge (8CPU, 30GB RAM, 2x80GB SSD)
Solr 4.10.1
Feeder instance
c3.2xlarge (8CPU, 15GB RAM, 2x80GB SSD)
apache access logs
python script to parse and feed them
Baseline test
15GB heap
debug query
status:404 in the last hour
charts query
all time status counters
all time top IPs
user agent word cloud
http://blog.sematext.com/2013/12/19/getting-started-with-logstash/
Baseline result
100K 2.5M 4M 6M 9M 10M
0
2000
4000
6000
8000
10000
12000
debug
charts
EPS
100K 2.5M 4M 6M 9M 10M
0
2000
4000
6000
8000
10000
12000
debug
charts
EPS
Baseline result
capacity
100K 2.5M 4M 6M 9M 10M
0
2000
4000
6000
8000
10000
12000
debug
charts
EPS
Baseline result
capacity
bottleneck: facets eat CPU
100K 2.5M 4M 6M 9M 10M
0
2000
4000
6000
8000
10000
12000
debug
charts
EPS
Baseline result
capacity
bottleneck: facets eat CPU
on average,
CPU is OK
100K 2.5M 4M 6M 9M 10M
0
2000
4000
6000
8000
10000
12000
debug
charts
EPS
Baseline result
capacity
bottleneck: facets eat CPU
indexing limited
because python
scripts eats
feeder CPU
on average,
CPU is OK
Indexing throughput: is it enough?
“it depends”
how long do you keep your logs?
1M logs/day * 10 days <> 0.3M logs/day * 30 days. Both need 10M capacity
1M logs/day * 30 days? Needs 3 servers, each getting 0.3M logs/day
Baseline run: 10M index fills up in <1/2h at 7K EPS
Indexing throughput: is it enough?
“it depends”
how long do you keep your logs?
1M logs/day * 10 days <> 0.3M logs/day * 30 days. Both need 10M capacity
1M logs/day * 30 days? Needs 3 servers, each getting 0.3M logs/day
how big are your spikes? (assumption: 10x regular load)
7K EPS is enough for 10M capacity if you keep logs >5h
1.5M 3M 5M 8M 11M
0
1000
2000
3000
4000
5000
6000
7000
8000
charts
EPS
debug
Rare commits
10% above baseline
auto soft commits every 5 seconds
auto hard commits every 30 minutes
RAMBufferSize=200MB; maxBufferedDocs=10M
Same results with
even rarer commits (auto-soft every 30s, 500MB buffer)
omitNorms + omitTermFreqAndPositions
larger caches
cache autowarming
THP disabled
mergeFactor 5
mergeFactor 20
but indexing
was cheaper
manually ran
queries, too
1.5M 3M 5M 8M 10M 12M
0
1000
2000
3000
4000
5000
6000
7000
8000
charts
EPS
debug
DocValues on IP and status code
20% above baseline
3M 10M 18M 24M 31M 36M
0
1000
2000
3000
4000
5000
6000
7000
8000
charts
EPS
debug
Detour: what if user agent was string?
3.6x baseline
8M 16M 24M 32M 40M 48M 56M 64M 67M 69M 70M 70.5M
0
1000
2000
3000
4000
5000
6000
7000
8000
charts
EPS
debug
… and if user agent used DocValues?
6.7x baseline
reducing indexing
adds 5% capacity
3M 7M 11M 15M 19M 23M 27M 28M
0
5000
10000
15000
20000
25000
30000
35000
charts
EPS
debug
Time based collections (1 minute)
2.7x baseline
OOM (150 collections)
10M 40M 70M 100M 130M 160M 190M 213M
0
1000
2000
3000
4000
5000
6000
7000
8000
charts
EPS
debug
Time based collections (10 minutes)
21x baseline
still OOM
(~100 collections)
50M 100M 150M 200M 250M 300M 310M 330M 340M
0
1000
2000
3000
4000
5000
6000
7000
8000
charts
EPS
debug
10min collections: 20GB heap; optimize old
31x baseline,
5 days projected retention
with 10x spikes
no more OOM,
just slower queries
34x baseline,
10 days projected
retention (10x)
Software optimizations recap
Definitely worth it Nice to have I wouldn't bother
time-based
collections
noop I/O scheduler merge policy tuning
DocValues omit norms, term
frequencies and
positions
autowarm
rare soft commits optimize “old”
collections
super-rare soft
commits
disable THP
20M 70M 120M 170M 220M 270M 320M 372M
0
1000
2000
3000
4000
5000
6000
7000
charts
EPS
debug
r3.2xlarge: +30GB RAM, +$0.14/h, 1x160GB SSD
37x baseline,
9 days projected retention
with 10x spikes
less indexing throughput
than m3.2xlarge
20M 50M 80M 110M 140M 170M 177M
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
charts
EPS
debug
c3.2xlarge: -15GB RAM, -$0.14/h
17x baseline,
5 days projected retention
with 10x spikes
Monthly EC2 cost per 1M logs*
m3.2xlarge: $1.3
r3.2xlarge: $1.33
c3.2xlarge: $1.78
TODO (a.k.a. truth always messes with simplicity):
more/expensive facets => more CPU => c3 looks better
less/cheap facets => not enough instance storage
=> EBS (magnetic/SSD/provisioned IOPS)?
=> storage-optimized i2?
=> old-gen instances with magnetic instance storage?
use different instance types for “hot” and “cold” collections?
*on-demand pricing at 2014-11-07
How NOT to build an indexing pipeline
custom script:
reads apache logs from files
parses them using regex
takes 100% CPU and 100% RAM
from a c3.2xlarge instance
maxes out at 7K EPS
Enter Apache Flume*
*Or Logstash. Or rsyslog. Or syslog-ng. Or any other specialized event processing tool
agent.sources = spoolSrc
agent.sources.spoolSrc.type = spooldir
agent.sources.spoolSrc.spoolDir = /var/log
agent.sources.spoolSrc.channels = solrChannel
agent.channels = solrChannel
agent.channels.solrChannel.type = file
agent.sinks.solrSink.channel = solrChannel
agent.sinks = solrSink
agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.solrSink.morphlineFile = conf/morphline.conf
agent.sinks.solrSink.morphlineId = 1
put Solr and Morphline
jars in lib/
channel
source
sink
morphline.conf (think Unix pipes)
morphlines : [
{ id : 1
commands : [
{ readLine { charset : UTF-8 } }
{
grok {
dictionaryFiles : [conf/grok-patterns]
expressions : {
message : """%{COMBINEDAPACHELOG}"""
}
}
}
{ generateUUID { field : id } }
{
loadSolr {
solrLocator : {
collection : collection1
solrUrl : "http://10.233.54.118:8983/solr/"
}
}
}
]
}
]
same ID as in the flume.conf
sink definition
process one line at a time
(there's also readMultiLine)
https://github.com/cloudera/search/blob/master/samples/solr-nrt/grok-dictionaries/grok-patterns
parses each property
(eg: IP, status code)
in its own field
Solr can
do it, too*
use zkHost
for SolrCloud
*http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
Result: 2.4K EPS, feeder machine almost idle
2.4K EPS is typically enough for this
application server
+ Flume agent
application server
+ Flume agent
application server
+ Flume agent
scales nicely with # of servers
but all buffering and processing
is done here
but not for this
application server
+ Flume agent
application server
+ Flume agent
application server
+ Flume agent
centralized buffering
and processing
Flume agent
Flume agent
or this
application server
+ Flume agent
application server
+ Flume agent
application server
+ Flume agent
buffer, then process (separately)
Flume agent
Flume agent
Flume agent
Increase throughput: batch sizes; memory channel
agent.sources = spoolSrc
agent.sources.spoolSrc.type = spooldir
agent.sources.spoolSrc.spoolDir = /var/log
agent.sources.spoolSrc.batchSize = 5000
agent.sources.spoolSrc.channels = solrChannel
agent.channels = solrChannel
agent.channels.solrChannel.type = file memory
agent.channels.solrChannel.capacity = 1000000
agent.channels.solrChannel.transactionCapacity = 5000
agent.sinks.solrSink.channel = solrChannel
agent.sinks = solrSink
agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.solrSink.morphlineFile = conf/morphline.conf
agent.sinks.solrSink.morphlineId = 1
agent.sinks.solrSink.batchSize = 5000
solrLocator : {
collection : collection1
solrUrl : "http://10.233.54.118:8983/solr/"
batchSize : 5000
}
make sure you have enough heap
Result: 10K EPS, 6%CPU usage (2x baseline)
More throughput? Parallelize
Depends* on the bottleneck
*last time I use this word, I promise
source channel sink
more threads
(if applicable)
more sources
multiplexing
channel selector
load balancing
sink processor
more threads
(if applicable)
Source1 C1
Source1
C1
Source2
Source1
C1
C2
C1 Sink1
C1
Sink1
Sink2
Result: default Solr install maxed out at 24K EPS
TODO: log in JSON where you can
Then, in morphline.conf, replace the grok command with the much ligher:
readJson {}
Easy with apache logs, maybe not for other apps:
LogFormat "{ 
"@timestamp": "%{%Y-%m-%dT%H:%M:%S%z}t", 
"message": "%h %l %u %t "%r" %>s %b", 
...
"method": "%m", 
"referer": "%{Referer}i", 
"useragent": "%{User-agent}i" 
}" ls_apache_json
CustomLog /var/log/apache2/logstash_test.ls_json ls_apache_json
More details at:
http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/
Conclusions
Use time-based collections and DocValues
Rare soft&hard commits are good
Pushing them too far is probably not worth it
Hardware: test and see what works for you
A balanced, SSD-backed machine (like m3) is a good start
Use specialized event processing tools
Apache Flume is a fine example
Processing and buffering on the application server side scales better
Buffer before [heavy] processing
Mind your batch sizes, buffer types and parallelization
Log in JSON where you can
Thank you!
Feel free to poke me @radu0gheorghe
Check us out at the booth, sematext.com and @sematext
We're hiring, too!

More Related Content

What's hot

ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com琛琳 饶
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Data Con LA
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyTim Bunce
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performanceForthscale
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209Tim Bunce
 
{{more}} Kibana4
{{more}} Kibana4{{more}} Kibana4
{{more}} Kibana4琛琳 饶
 
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Ontico
 
Roll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and LuaRoll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and LuaJon Moore
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scriptingTony Fabeen
 
Using ngx_lua in UPYUN
Using ngx_lua in UPYUNUsing ngx_lua in UPYUN
Using ngx_lua in UPYUNCong Zhang
 
Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Tim Bunce
 
PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012Tim Bunce
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Node.js streaming csv downloads proxy
Node.js streaming csv downloads proxyNode.js streaming csv downloads proxy
Node.js streaming csv downloads proxyIsmael Celis
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014Amazon Web Services
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809Tim Bunce
 
Lua tech talk
Lua tech talkLua tech talk
Lua tech talkLocaweb
 

What's hot (19)

ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
 
Using Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibanaUsing Logstash, elasticsearch & kibana
Using Logstash, elasticsearch & kibana
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performance
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209
 
{{more}} Kibana4
{{more}} Kibana4{{more}} Kibana4
{{more}} Kibana4
 
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
 
Roll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and LuaRoll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and Lua
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scripting
 
Using ngx_lua in UPYUN
Using ngx_lua in UPYUNUsing ngx_lua in UPYUN
Using ngx_lua in UPYUN
 
Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013
 
PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Node.js streaming csv downloads proxy
Node.js streaming csv downloads proxyNode.js streaming csv downloads proxy
Node.js streaming csv downloads proxy
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809
 
Lua tech talk
Lua tech talkLua tech talk
Lua tech talk
 

Viewers also liked

Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseJesse Yates
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingOlga Lavrentieva
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache SolrAnshum Gupta
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBaseSematext Group, Inc.
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetLucidworks
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application ArchetypesCloudera, Inc.
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...Sematext Group, Inc.
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchDataWorks Summit/Hadoop Summit
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchSematext Group, Inc.
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextLucidworks
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrVadim Kirilchuk
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 

Viewers also liked (20)

Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache Solr
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba Search
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 

Similar to Tuning Solr for Logs

Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoringMiguel Rodriguez
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyAerospike
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and ProfilingWSO2
 
PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.
PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.
PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.Puppet
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
Migrating the elastic stack to the cloud, or application logging @ travix
 Migrating the elastic stack to the cloud, or application logging @ travix Migrating the elastic stack to the cloud, or application logging @ travix
Migrating the elastic stack to the cloud, or application logging @ travixRuslan Lutsenko
 
php & performance
 php & performance php & performance
php & performancesimon8410
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Community
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMRFaizan Javed
 

Similar to Tuning Solr for Logs (20)

Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and Profiling
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.
PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.
PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Migrating the elastic stack to the cloud, or application logging @ travix
 Migrating the elastic stack to the cloud, or application logging @ travix Migrating the elastic stack to the cloud, or application logging @ travix
Migrating the elastic stack to the cloud, or application logging @ travix
 
php & performance
 php & performance php & performance
php & performance
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR
 
Velocity 2010 - ATS
Velocity 2010 - ATSVelocity 2010 - ATS
Velocity 2010 - ATS
 

More from Sematext Group, Inc.

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedSematext Group, Inc.
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsSematext Group, Inc.
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?Sematext Group, Inc.
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization Sematext Group, Inc.
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaSematext Group, Inc.
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleSematext Group, Inc.
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSematext Group, Inc.
 

More from Sematext Group, Inc. (14)

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
(Elastic)search in big data
(Elastic)search in big data(Elastic)search in big data
(Elastic)search in big data
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
 
Open Source Search Evolution
Open Source Search EvolutionOpen Source Search Evolution
Open Source Search Evolution
 
Elasticsearch and Solr for Logs
Elasticsearch and Solr for LogsElasticsearch and Solr for Logs
Elasticsearch and Solr for Logs
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Tuning Solr for Logs

  • 1.
  • 2. Tuning Solr for Logs Radu Gheorghe @radu0gheorghe @sematext
  • 3. /me does... Logsenesearch consulting + = logging consulting .com/logsene
  • 4. Tuning. Is it worth it? baseline last run # of logs 10M 310M EC2 bill/month 700 450
  • 5. What to optimize for? http://www.seasonslogs.co.uk/images/products/SL_001.png https://openclipart.org/image/300px/svg_to_png/169833/Server_1U.png capacity: how many logs the same hardware can keep while still providing decent performance
  • 6. What's decent performance? “It depends” Assumptions indexing: enough to keep up with generated logs* search concurrency search latency: 2s for debug queries, 5s for charts *account for spikes!
  • 7. Enough theory, let's start testing! Solr instance m3.2xlarge (8CPU, 30GB RAM, 2x80GB SSD) Solr 4.10.1 Feeder instance c3.2xlarge (8CPU, 15GB RAM, 2x80GB SSD) apache access logs python script to parse and feed them
  • 8. Baseline test 15GB heap debug query status:404 in the last hour charts query all time status counters all time top IPs user agent word cloud http://blog.sematext.com/2013/12/19/getting-started-with-logstash/
  • 9. Baseline result 100K 2.5M 4M 6M 9M 10M 0 2000 4000 6000 8000 10000 12000 debug charts EPS
  • 10. 100K 2.5M 4M 6M 9M 10M 0 2000 4000 6000 8000 10000 12000 debug charts EPS Baseline result capacity
  • 11. 100K 2.5M 4M 6M 9M 10M 0 2000 4000 6000 8000 10000 12000 debug charts EPS Baseline result capacity bottleneck: facets eat CPU
  • 12. 100K 2.5M 4M 6M 9M 10M 0 2000 4000 6000 8000 10000 12000 debug charts EPS Baseline result capacity bottleneck: facets eat CPU on average, CPU is OK
  • 13. 100K 2.5M 4M 6M 9M 10M 0 2000 4000 6000 8000 10000 12000 debug charts EPS Baseline result capacity bottleneck: facets eat CPU indexing limited because python scripts eats feeder CPU on average, CPU is OK
  • 14. Indexing throughput: is it enough? “it depends” how long do you keep your logs? 1M logs/day * 10 days <> 0.3M logs/day * 30 days. Both need 10M capacity 1M logs/day * 30 days? Needs 3 servers, each getting 0.3M logs/day Baseline run: 10M index fills up in <1/2h at 7K EPS
  • 15. Indexing throughput: is it enough? “it depends” how long do you keep your logs? 1M logs/day * 10 days <> 0.3M logs/day * 30 days. Both need 10M capacity 1M logs/day * 30 days? Needs 3 servers, each getting 0.3M logs/day how big are your spikes? (assumption: 10x regular load) 7K EPS is enough for 10M capacity if you keep logs >5h
  • 16. 1.5M 3M 5M 8M 11M 0 1000 2000 3000 4000 5000 6000 7000 8000 charts EPS debug Rare commits 10% above baseline auto soft commits every 5 seconds auto hard commits every 30 minutes RAMBufferSize=200MB; maxBufferedDocs=10M
  • 17. Same results with even rarer commits (auto-soft every 30s, 500MB buffer) omitNorms + omitTermFreqAndPositions larger caches cache autowarming THP disabled mergeFactor 5 mergeFactor 20 but indexing was cheaper manually ran queries, too
  • 18. 1.5M 3M 5M 8M 10M 12M 0 1000 2000 3000 4000 5000 6000 7000 8000 charts EPS debug DocValues on IP and status code 20% above baseline
  • 19. 3M 10M 18M 24M 31M 36M 0 1000 2000 3000 4000 5000 6000 7000 8000 charts EPS debug Detour: what if user agent was string? 3.6x baseline
  • 20. 8M 16M 24M 32M 40M 48M 56M 64M 67M 69M 70M 70.5M 0 1000 2000 3000 4000 5000 6000 7000 8000 charts EPS debug … and if user agent used DocValues? 6.7x baseline reducing indexing adds 5% capacity
  • 21. 3M 7M 11M 15M 19M 23M 27M 28M 0 5000 10000 15000 20000 25000 30000 35000 charts EPS debug Time based collections (1 minute) 2.7x baseline OOM (150 collections)
  • 22. 10M 40M 70M 100M 130M 160M 190M 213M 0 1000 2000 3000 4000 5000 6000 7000 8000 charts EPS debug Time based collections (10 minutes) 21x baseline still OOM (~100 collections)
  • 23. 50M 100M 150M 200M 250M 300M 310M 330M 340M 0 1000 2000 3000 4000 5000 6000 7000 8000 charts EPS debug 10min collections: 20GB heap; optimize old 31x baseline, 5 days projected retention with 10x spikes no more OOM, just slower queries 34x baseline, 10 days projected retention (10x)
  • 24. Software optimizations recap Definitely worth it Nice to have I wouldn't bother time-based collections noop I/O scheduler merge policy tuning DocValues omit norms, term frequencies and positions autowarm rare soft commits optimize “old” collections super-rare soft commits disable THP
  • 25. 20M 70M 120M 170M 220M 270M 320M 372M 0 1000 2000 3000 4000 5000 6000 7000 charts EPS debug r3.2xlarge: +30GB RAM, +$0.14/h, 1x160GB SSD 37x baseline, 9 days projected retention with 10x spikes less indexing throughput than m3.2xlarge
  • 26. 20M 50M 80M 110M 140M 170M 177M 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 charts EPS debug c3.2xlarge: -15GB RAM, -$0.14/h 17x baseline, 5 days projected retention with 10x spikes
  • 27. Monthly EC2 cost per 1M logs* m3.2xlarge: $1.3 r3.2xlarge: $1.33 c3.2xlarge: $1.78 TODO (a.k.a. truth always messes with simplicity): more/expensive facets => more CPU => c3 looks better less/cheap facets => not enough instance storage => EBS (magnetic/SSD/provisioned IOPS)? => storage-optimized i2? => old-gen instances with magnetic instance storage? use different instance types for “hot” and “cold” collections? *on-demand pricing at 2014-11-07
  • 28. How NOT to build an indexing pipeline custom script: reads apache logs from files parses them using regex takes 100% CPU and 100% RAM from a c3.2xlarge instance maxes out at 7K EPS
  • 29. Enter Apache Flume* *Or Logstash. Or rsyslog. Or syslog-ng. Or any other specialized event processing tool agent.sources = spoolSrc agent.sources.spoolSrc.type = spooldir agent.sources.spoolSrc.spoolDir = /var/log agent.sources.spoolSrc.channels = solrChannel agent.channels = solrChannel agent.channels.solrChannel.type = file agent.sinks.solrSink.channel = solrChannel agent.sinks = solrSink agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.solrSink.morphlineFile = conf/morphline.conf agent.sinks.solrSink.morphlineId = 1 put Solr and Morphline jars in lib/ channel source sink
  • 30. morphline.conf (think Unix pipes) morphlines : [ { id : 1 commands : [ { readLine { charset : UTF-8 } } { grok { dictionaryFiles : [conf/grok-patterns] expressions : { message : """%{COMBINEDAPACHELOG}""" } } } { generateUUID { field : id } } { loadSolr { solrLocator : { collection : collection1 solrUrl : "http://10.233.54.118:8983/solr/" } } } ] } ] same ID as in the flume.conf sink definition process one line at a time (there's also readMultiLine) https://github.com/cloudera/search/blob/master/samples/solr-nrt/grok-dictionaries/grok-patterns parses each property (eg: IP, status code) in its own field Solr can do it, too* use zkHost for SolrCloud *http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
  • 31. Result: 2.4K EPS, feeder machine almost idle
  • 32. 2.4K EPS is typically enough for this application server + Flume agent application server + Flume agent application server + Flume agent scales nicely with # of servers but all buffering and processing is done here
  • 33. but not for this application server + Flume agent application server + Flume agent application server + Flume agent centralized buffering and processing Flume agent Flume agent
  • 34. or this application server + Flume agent application server + Flume agent application server + Flume agent buffer, then process (separately) Flume agent Flume agent Flume agent
  • 35. Increase throughput: batch sizes; memory channel agent.sources = spoolSrc agent.sources.spoolSrc.type = spooldir agent.sources.spoolSrc.spoolDir = /var/log agent.sources.spoolSrc.batchSize = 5000 agent.sources.spoolSrc.channels = solrChannel agent.channels = solrChannel agent.channels.solrChannel.type = file memory agent.channels.solrChannel.capacity = 1000000 agent.channels.solrChannel.transactionCapacity = 5000 agent.sinks.solrSink.channel = solrChannel agent.sinks = solrSink agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.solrSink.morphlineFile = conf/morphline.conf agent.sinks.solrSink.morphlineId = 1 agent.sinks.solrSink.batchSize = 5000 solrLocator : { collection : collection1 solrUrl : "http://10.233.54.118:8983/solr/" batchSize : 5000 } make sure you have enough heap
  • 36. Result: 10K EPS, 6%CPU usage (2x baseline)
  • 37. More throughput? Parallelize Depends* on the bottleneck *last time I use this word, I promise source channel sink more threads (if applicable) more sources multiplexing channel selector load balancing sink processor more threads (if applicable) Source1 C1 Source1 C1 Source2 Source1 C1 C2 C1 Sink1 C1 Sink1 Sink2
  • 38. Result: default Solr install maxed out at 24K EPS
  • 39. TODO: log in JSON where you can Then, in morphline.conf, replace the grok command with the much ligher: readJson {} Easy with apache logs, maybe not for other apps: LogFormat "{ "@timestamp": "%{%Y-%m-%dT%H:%M:%S%z}t", "message": "%h %l %u %t "%r" %>s %b", ... "method": "%m", "referer": "%{Referer}i", "useragent": "%{User-agent}i" }" ls_apache_json CustomLog /var/log/apache2/logstash_test.ls_json ls_apache_json More details at: http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/
  • 40. Conclusions Use time-based collections and DocValues Rare soft&hard commits are good Pushing them too far is probably not worth it Hardware: test and see what works for you A balanced, SSD-backed machine (like m3) is a good start Use specialized event processing tools Apache Flume is a fine example Processing and buffering on the application server side scales better Buffer before [heavy] processing Mind your batch sizes, buffer types and parallelization Log in JSON where you can
  • 41. Thank you! Feel free to poke me @radu0gheorghe Check us out at the booth, sematext.com and @sematext We're hiring, too!