SlideShare a Scribd company logo
1 of 45
Download to read offline
OCTOBER 11-14, 2016 • BOSTON, MA
Tuning Solr and its Pipeline for Logs
Rafał Kuć and Radu Gheorghe
Software Engineers, Sematext Group, Inc.
3
01
Agenda
Designing a Solr(Cloud) cluster for time-series data
Solr and operating system knobs to tune
Pipeline patterns and shipping options
4
01
Time-based collections, the single best improvement
14.10
indexing
5
01
Time-based collections, the single best improvement
14.10
indexing
15.10
6
01
Time-based collections, the single best improvement
14.10
indexing
15.10
7
01
Time-based collections, the single best improvement
14.10 15.10 ... 21.10
indexing
8
01
Time-based collections, the single best improvement
14.10 15.10 ... 21.10
indexing
Less merging ⇒ faster indexing
Quick deletes (of whole collections)
Search only some collections
Better use of caches
9
01
Load is uneven
Black
Friday
Saturday Sunday
1
0
01
Load is uneven
Need to “rotate” collections fast enough to
work with this (otherwise indexing and
queries will be slow)
These will be tiny
Black
Friday
Saturday Sunday
1
1
01
If load is uneven, daily/monthly/etc indices are suboptimal:
you either have poor performance or too many collections
Octi* is worried:
* this is Octi →
1
2
01
Solution: rotate by size
indexing
logs01
Size limit
1
3
01
Solution: rotate by size
indexing
logs01
Size limit
1
4
01
Solution: rotate by size
indexing
logs01
logs02
Size limit
1
5
01
Solution: rotate by size
indexing
logs01
logs02
Size limit
1
6
01
Solution: rotate by size
indexing
logs01
logs08...
logs02
1
7
01
Solution: rotate by size
indexingPredictable indexing and search performance
Fewer shards
logs01
logs08...
logs02
1
8
01
Dealing with size-based collections
logs01
logs08...
logs02
app (caches results)
stats
2016-10-11 2016-10-13 2016-10-12 2016-10-14
2016-10-18
doesn’t matter,
it’s the latest collection
1
9
01
Size-based collections handle spiky load better
Octi concludes:
2
0
01
Tiered cluster (a.k.a. hot-cold)
14 Oct
11 Oct
10 Oct
12 Oct
13 Oct
hot01 cold01 cold02
indexing, most searches longer-running (+cached) searches
2
1
01
Tiered cluster (a.k.a. hot-cold)
14 Oct
11 Oct
10 Oct
12 Oct
13 Oct
hot01 cold01 cold02
indexing, most searches longer-running (+cached) searches
⇒ Good CPU and IO* ⇒ Heap. Decent IO for replication&backup
* Ideally local SSD; avoid network storage unless it’s really good
2
2
01
Octi likes tiered clusters
Costs: you can use different hardware for different workloads
Performance (see costs): fewer shards, less overhead
Isolation: long-running searches don’t slow down indexing
2
3
01
AWS specifics
Hot tier:
c3 (compute optimized) + EBS and use local SSD as cache*
c4 (EBS only)
Cold tier:
d2 (big local HDDs + lots of RAM)
m4 (general purpose) + EBS
i2 (big local SSDs + lots of RAM)
General stuff:
EBS optimized
Enhanced Networking
VPC (to get access to c4&m4 instances)
* Use --cachemode writeback for async writing:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ht
ml/Logical_Volume_Manager_Administration/lvm_cache_volume_creation.html
PIOPS is best but expensive
HDD - too slow (unless cold=icy)
⇒ General purpose SSDs
2
4
01
EBS
Stay under 3TB. More smaller (<1TB) drives in RAID0 give better, but shorter IOPS bursts
Performance isn’t guaranteed ⇒ RAID0 will wait for the slowest disk
Check limits (e.g. 160MB/s per drive, instance-dependent IOPS and network)
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html
3 IOPS/GB up to 10K (~3TB), up to 256kb/IO, merges up to 4 consecutive IOs
2
5
01
Octi’s AWS top picks
c4s for the hot tier: cheaper than c3s with similar performance
m4s for the cold tier: well balanced, scale up to 10xl, flexible storage via EBS
EBS drives < 3TB, otherwise avoids RAID0: higher chances of performance drop
2
6
01
Scratching the surface of OS options
Say no to swap
Disk scheduler: CFQ for HDD, deadline for SSD
Mount options: noatime, nodiratime, data=writeback, nobarrier
For bare metal: check CPU governor and THP*
because strict ordering is for the weak
* often it’s enabled, but /proc/sys/vm/nr_hugepages is 0
2
7
01
Schema and solrconfig
Auto soft commit (5s?)
Auto commit (few minutes?)
RAM buffer size + Max buffered docs
Doc Values for faceting
+ retrieving those fields (stored=false)
Omit norms, frequencies and positions
Don’t store catch-all field(s)
2
8
01
Relaxing the merge policy*
Merges are the heaviest part of indexing
Facets are the heaviest part of searching
Facets (except method=enum) depend on data size more than # of segments
Knobs:
segmentsPerTier: more segments ⇒ less merging
maxMergeAtOnce < segmentsPerTier, to smooth out those IO spikes
maxMergedSegmentMB: lower to merge more small segments, less bigger ones
* unless you only do “grep”. YMMV, of course. Keep an eye on open files, though
⇒ fewer open files
2
9
01
Some numbers: more segments, more throughput (10-15% here)
10 segmentsPerTier
10 maxMergeAtOnce
50 segmentsPerTier
50 maxMergeAtOnce
need to rotate
before this drop
3
0
01
Lower max segment (500MB from 5GB default)
less CPU fewer segments
3
1
01
There’s more...
SPM screenshots from all tests + JMeter test plan here:
https://github.com/sematext/lucene-revolution-samples/tree/master/2016/solr_logging
We’d love to hear about your own results!
correct spelling: sematext.com/spm
3
2
01
Increasing segments per tier while decreasing max merged
segment (by an order of magnitude) makes indexing better and
search latency less spiky
Octi’s conclusions so far
3
3
01
Optimize I/O and CPU by not optimizing
Unless you have spare CPU & IO (why would you?)
And unless you run out of open files
Only do this on “old” indices!
3
4
01
Optimizing the pipeline*
logs
log shipper(s)
Ship using which protocol(s)?
Buffer
Route to other destinations?
Parse
* for availability and performance/costs
Or log to Solr directly from app
(i.e. implement a new, embedded log shipper)
3
5
01
A case for buffers
performance & availability
allows batches and threads when Solr is down or can’t keep up
3
6
01
Types of buffers
Disk*, memory or a combination
On the logging host or centralized
* doesn’t mean it fsync()s for every message
file or local log shipper
Easy scaling; fewer moving parts
Often requires a lightweight shipper
Kafka/Redis/etc or central log shipper
Extra features (e.g. TTL)
One place for changes
bufferapp
bufferapp
3
7
01
Multiple destinations
* or flat files, or S3 or...
input buffer
processing
Outputs need to be in sync
input Processing may
cause backpressure
processing
*
3
8
01
Multiple destinations
input
Solr
offset
HDFS
offset
input
processing
processing
3
9
01
Just Solr and maybe flat files? Go simple with a local shipper
Custom, fast-changing processing & multiple destinations? Kafka as a central buffer
Octi’s pipeline preferences
4
0
01
Parsing unstructured data
Ideally, log in JSON*
Otherwise, parse
* or another serialization format
For performance and maintenance
(i.e. no need to update parsing rules)
Regex-based (e.g. grok)
Easy to build rules
Rules are flexible
Typically slow & O(n) on # of rules, but:
Move matching patterns to the top of the list
Move broad patterns to the bottom
Skip patterns including others that didn’t match
Grammar-based
(e.g. liblognorm, PatternDB)
Faster. O(1) on # of rules
Numbers in our 2015 session:
sematext.com/blog/2015/10/16/large-scale-log-analytics-with-solr/
4
1
01
Decide what gives when buffers fill up
input shipper
Can drop data here, but better buffer
when shipper buffer is full, app can block or drop data
Check:
Local files: what happens when files are rotated/archived/deleted?
UDP: network buffers should handle spiky load*
TCP: what happens when connection breaks/times out?
UNIX sockets: what happens when socket blocks writes**?
* you’d normally increase net.core.rmem_max and rmem_default
** both DGRAM and STREAM local sockets are reliable (vs Internet ones, UDP and TCP)
4
2
01
Octi’s flow chart of where to log
critical?
UDP. Increase network
buffers on destination,
so it can handle spiky
traffic
Paying with
RAM or IO?
UNIX socket. Local
shipper with memory
buffers, that can
drop data if needed
Local files. Make sure
rotation is in place or
you’ll run out of disk!
no
IO RAM
yes
4
3
01
Protocols
UDP: cool for the app, but not reliable
TCP: more reliable, but not completely
Application-level ACKs may be needed:
No failure/backpressure handling needed
App gets ACK when OS buffer gets it
⇒ no retransmit if buffer is lost*
* more at blog.gerhards.net/2008/05/why-you-cant-build-reliable-tcp.html
sender receiver
ACKs
Protocol Example shippers
HTTP Logstash, rsyslog, Fluentd
RELP rsyslog, Logstash
Beats Filebeat, Logstash
Kafka Fluentd, Filebeat, rsyslog, Logstash
4
4
01
Octi’s top pipeline+shipper combos
rsyslog
app
UNIX
socket
HTTP
memory+disk buffer
can drop
app
fileKafka
consumer
Filebeat
HTTP
simple & reliable
4
5
01
Conclusions, questions, we’re hiring, thank you
The whole talk was pretty much only conclusions :)
Feels like there’s much more to discover. Please test & share your own nuggets
http://www.relatably.com/m/img/oh-my-god-memes/oh-my-god.jpg
Scary word, ha? Poke us:
@kucrafal @radu0gheorghe @sematext
...or @ our booth here

More Related Content

What's hot

What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to YouAmazon Web Services
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Search-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, EtsySearch-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, EtsyLucidworks
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterlucenerevolution
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkDvir Volk
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsSpeedment, Inc.
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
Solr Performance Monitoring with SPM
Solr Performance Monitoring with SPMSolr Performance Monitoring with SPM
Solr Performance Monitoring with SPMSematext Group, Inc.
 
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...Uri Cohen
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)searchbox-com
 

What's hot (20)

What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Search-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, EtsySearch-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, Etsy
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
DZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using StreamsDZone Java 8 Block Buster: Query Databases Using Streams
DZone Java 8 Block Buster: Query Databases Using Streams
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Solr Performance Monitoring with SPM
Solr Performance Monitoring with SPMSolr Performance Monitoring with SPM
Solr Performance Monitoring with SPM
 
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...
To scale or not to scale: Key/Value, Document, SQL, JPA – What’s right for my...
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
 

Viewers also liked

Cloud Camp: Infrastructure as a service advance workloads
Cloud Camp: Infrastructure as a service advance workloadsCloud Camp: Infrastructure as a service advance workloads
Cloud Camp: Infrastructure as a service advance workloadsAsaf Nakash
 
Trends at JavaOne 2016: Microservices, Docker and Cloud-Native Middleware
Trends at JavaOne 2016: Microservices, Docker and Cloud-Native MiddlewareTrends at JavaOne 2016: Microservices, Docker and Cloud-Native Middleware
Trends at JavaOne 2016: Microservices, Docker and Cloud-Native MiddlewareKai Wähner
 
Philips Big Data Expo
Philips Big Data ExpoPhilips Big Data Expo
Philips Big Data ExpoBigDataExpo
 
“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...
“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...
“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...liela_stunda
 
Global Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMSGlobal Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMSBruno Lopes
 
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & ComplianceAmazon Web Services
 
KD2017_System Center in the "cloud first" era
KD2017_System Center in the "cloud first" eraKD2017_System Center in the "cloud first" era
KD2017_System Center in the "cloud first" eraTomica Kaniski
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Lucidworks
 
Chapter 3 Computer Crimes
Chapter 3 Computer  CrimesChapter 3 Computer  Crimes
Chapter 3 Computer CrimesMar Soriano
 
Oracle cloud, private, public and hybrid
Oracle cloud, private, public and hybridOracle cloud, private, public and hybrid
Oracle cloud, private, public and hybridJohan Louwers
 
VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)John Hubbard
 
Emerging Technologies: Heroku for ISVs (October 13, 2014)
Emerging Technologies: Heroku for ISVs (October 13, 2014)Emerging Technologies: Heroku for ISVs (October 13, 2014)
Emerging Technologies: Heroku for ISVs (October 13, 2014)Salesforce Partners
 
Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17Jon Petter Hjulstad
 
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VACleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VAClearedJobs.Net
 
Microsoft Big Data Expo
Microsoft Big Data ExpoMicrosoft Big Data Expo
Microsoft Big Data ExpoBigDataExpo
 
Opensource Search Engines
Opensource Search EnginesOpensource Search Engines
Opensource Search Enginescusy GmbH
 
Walmart Big Data Expo
Walmart Big Data ExpoWalmart Big Data Expo
Walmart Big Data ExpoBigDataExpo
 
Info qiy foundation digital me - dappre-eng-aug17
Info qiy foundation   digital me - dappre-eng-aug17Info qiy foundation   digital me - dappre-eng-aug17
Info qiy foundation digital me - dappre-eng-aug17BigDataExpo
 

Viewers also liked (20)

Cloud Camp: Infrastructure as a service advance workloads
Cloud Camp: Infrastructure as a service advance workloadsCloud Camp: Infrastructure as a service advance workloads
Cloud Camp: Infrastructure as a service advance workloads
 
Trends at JavaOne 2016: Microservices, Docker and Cloud-Native Middleware
Trends at JavaOne 2016: Microservices, Docker and Cloud-Native MiddlewareTrends at JavaOne 2016: Microservices, Docker and Cloud-Native Middleware
Trends at JavaOne 2016: Microservices, Docker and Cloud-Native Middleware
 
Philips Big Data Expo
Philips Big Data ExpoPhilips Big Data Expo
Philips Big Data Expo
 
“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...
“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...
“Ūdens resursi. Saglabāsim ūdeni kopā!” Pasaules lielākā mācību stunda Daugav...
 
Global Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMSGlobal Azure Bootcamp - Azure OMS
Global Azure Bootcamp - Azure OMS
 
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
(SEC320) Leveraging the Power of AWS to Automate Security & Compliance
 
KD2017_System Center in the "cloud first" era
KD2017_System Center in the "cloud first" eraKD2017_System Center in the "cloud first" era
KD2017_System Center in the "cloud first" era
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
 
Water resources
Water resourcesWater resources
Water resources
 
Chapter 3 Computer Crimes
Chapter 3 Computer  CrimesChapter 3 Computer  Crimes
Chapter 3 Computer Crimes
 
Oracle cloud, private, public and hybrid
Oracle cloud, private, public and hybridOracle cloud, private, public and hybrid
Oracle cloud, private, public and hybrid
 
VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)VMs All the Way Down (BSides Delaware 2016)
VMs All the Way Down (BSides Delaware 2016)
 
Oracle Cloud Café IoT 12-APR-2016
Oracle Cloud Café IoT 12-APR-2016Oracle Cloud Café IoT 12-APR-2016
Oracle Cloud Café IoT 12-APR-2016
 
Emerging Technologies: Heroku for ISVs (October 13, 2014)
Emerging Technologies: Heroku for ISVs (October 13, 2014)Emerging Technologies: Heroku for ISVs (October 13, 2014)
Emerging Technologies: Heroku for ISVs (October 13, 2014)
 
Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17Status Quo on the automation support in SOA Suite OGhTech17
Status Quo on the automation support in SOA Suite OGhTech17
 
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VACleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
 
Microsoft Big Data Expo
Microsoft Big Data ExpoMicrosoft Big Data Expo
Microsoft Big Data Expo
 
Opensource Search Engines
Opensource Search EnginesOpensource Search Engines
Opensource Search Engines
 
Walmart Big Data Expo
Walmart Big Data ExpoWalmart Big Data Expo
Walmart Big Data Expo
 
Info qiy foundation digital me - dappre-eng-aug17
Info qiy foundation   digital me - dappre-eng-aug17Info qiy foundation   digital me - dappre-eng-aug17
Info qiy foundation digital me - dappre-eng-aug17
 

Similar to Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe, Sematext

Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsKyle Hailey
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisJignesh Shah
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningScott Jenner
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudoysteing
 
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASA presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASRahul Janghel
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performancesolarisyougood
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack OptimizationDave Ross
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performancexKinAnx
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs FasterBob Ward
 

Similar to Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe, Sematext (20)

Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O Statistics
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
 
Assignment 2 Theoretical
Assignment 2 TheoreticalAssignment 2 Theoretical
Assignment 2 Theoretical
 
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASA presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack Optimization
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe, Sematext

  • 1. OCTOBER 11-14, 2016 • BOSTON, MA
  • 2. Tuning Solr and its Pipeline for Logs Rafał Kuć and Radu Gheorghe Software Engineers, Sematext Group, Inc.
  • 3. 3 01 Agenda Designing a Solr(Cloud) cluster for time-series data Solr and operating system knobs to tune Pipeline patterns and shipping options
  • 4. 4 01 Time-based collections, the single best improvement 14.10 indexing
  • 5. 5 01 Time-based collections, the single best improvement 14.10 indexing 15.10
  • 6. 6 01 Time-based collections, the single best improvement 14.10 indexing 15.10
  • 7. 7 01 Time-based collections, the single best improvement 14.10 15.10 ... 21.10 indexing
  • 8. 8 01 Time-based collections, the single best improvement 14.10 15.10 ... 21.10 indexing Less merging ⇒ faster indexing Quick deletes (of whole collections) Search only some collections Better use of caches
  • 10. 1 0 01 Load is uneven Need to “rotate” collections fast enough to work with this (otherwise indexing and queries will be slow) These will be tiny Black Friday Saturday Sunday
  • 11. 1 1 01 If load is uneven, daily/monthly/etc indices are suboptimal: you either have poor performance or too many collections Octi* is worried: * this is Octi →
  • 12. 1 2 01 Solution: rotate by size indexing logs01 Size limit
  • 13. 1 3 01 Solution: rotate by size indexing logs01 Size limit
  • 14. 1 4 01 Solution: rotate by size indexing logs01 logs02 Size limit
  • 15. 1 5 01 Solution: rotate by size indexing logs01 logs02 Size limit
  • 16. 1 6 01 Solution: rotate by size indexing logs01 logs08... logs02
  • 17. 1 7 01 Solution: rotate by size indexingPredictable indexing and search performance Fewer shards logs01 logs08... logs02
  • 18. 1 8 01 Dealing with size-based collections logs01 logs08... logs02 app (caches results) stats 2016-10-11 2016-10-13 2016-10-12 2016-10-14 2016-10-18 doesn’t matter, it’s the latest collection
  • 19. 1 9 01 Size-based collections handle spiky load better Octi concludes:
  • 20. 2 0 01 Tiered cluster (a.k.a. hot-cold) 14 Oct 11 Oct 10 Oct 12 Oct 13 Oct hot01 cold01 cold02 indexing, most searches longer-running (+cached) searches
  • 21. 2 1 01 Tiered cluster (a.k.a. hot-cold) 14 Oct 11 Oct 10 Oct 12 Oct 13 Oct hot01 cold01 cold02 indexing, most searches longer-running (+cached) searches ⇒ Good CPU and IO* ⇒ Heap. Decent IO for replication&backup * Ideally local SSD; avoid network storage unless it’s really good
  • 22. 2 2 01 Octi likes tiered clusters Costs: you can use different hardware for different workloads Performance (see costs): fewer shards, less overhead Isolation: long-running searches don’t slow down indexing
  • 23. 2 3 01 AWS specifics Hot tier: c3 (compute optimized) + EBS and use local SSD as cache* c4 (EBS only) Cold tier: d2 (big local HDDs + lots of RAM) m4 (general purpose) + EBS i2 (big local SSDs + lots of RAM) General stuff: EBS optimized Enhanced Networking VPC (to get access to c4&m4 instances) * Use --cachemode writeback for async writing: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ht ml/Logical_Volume_Manager_Administration/lvm_cache_volume_creation.html
  • 24. PIOPS is best but expensive HDD - too slow (unless cold=icy) ⇒ General purpose SSDs 2 4 01 EBS Stay under 3TB. More smaller (<1TB) drives in RAID0 give better, but shorter IOPS bursts Performance isn’t guaranteed ⇒ RAID0 will wait for the slowest disk Check limits (e.g. 160MB/s per drive, instance-dependent IOPS and network) http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html 3 IOPS/GB up to 10K (~3TB), up to 256kb/IO, merges up to 4 consecutive IOs
  • 25. 2 5 01 Octi’s AWS top picks c4s for the hot tier: cheaper than c3s with similar performance m4s for the cold tier: well balanced, scale up to 10xl, flexible storage via EBS EBS drives < 3TB, otherwise avoids RAID0: higher chances of performance drop
  • 26. 2 6 01 Scratching the surface of OS options Say no to swap Disk scheduler: CFQ for HDD, deadline for SSD Mount options: noatime, nodiratime, data=writeback, nobarrier For bare metal: check CPU governor and THP* because strict ordering is for the weak * often it’s enabled, but /proc/sys/vm/nr_hugepages is 0
  • 27. 2 7 01 Schema and solrconfig Auto soft commit (5s?) Auto commit (few minutes?) RAM buffer size + Max buffered docs Doc Values for faceting + retrieving those fields (stored=false) Omit norms, frequencies and positions Don’t store catch-all field(s)
  • 28. 2 8 01 Relaxing the merge policy* Merges are the heaviest part of indexing Facets are the heaviest part of searching Facets (except method=enum) depend on data size more than # of segments Knobs: segmentsPerTier: more segments ⇒ less merging maxMergeAtOnce < segmentsPerTier, to smooth out those IO spikes maxMergedSegmentMB: lower to merge more small segments, less bigger ones * unless you only do “grep”. YMMV, of course. Keep an eye on open files, though ⇒ fewer open files
  • 29. 2 9 01 Some numbers: more segments, more throughput (10-15% here) 10 segmentsPerTier 10 maxMergeAtOnce 50 segmentsPerTier 50 maxMergeAtOnce need to rotate before this drop
  • 30. 3 0 01 Lower max segment (500MB from 5GB default) less CPU fewer segments
  • 31. 3 1 01 There’s more... SPM screenshots from all tests + JMeter test plan here: https://github.com/sematext/lucene-revolution-samples/tree/master/2016/solr_logging We’d love to hear about your own results! correct spelling: sematext.com/spm
  • 32. 3 2 01 Increasing segments per tier while decreasing max merged segment (by an order of magnitude) makes indexing better and search latency less spiky Octi’s conclusions so far
  • 33. 3 3 01 Optimize I/O and CPU by not optimizing Unless you have spare CPU & IO (why would you?) And unless you run out of open files Only do this on “old” indices!
  • 34. 3 4 01 Optimizing the pipeline* logs log shipper(s) Ship using which protocol(s)? Buffer Route to other destinations? Parse * for availability and performance/costs Or log to Solr directly from app (i.e. implement a new, embedded log shipper)
  • 35. 3 5 01 A case for buffers performance & availability allows batches and threads when Solr is down or can’t keep up
  • 36. 3 6 01 Types of buffers Disk*, memory or a combination On the logging host or centralized * doesn’t mean it fsync()s for every message file or local log shipper Easy scaling; fewer moving parts Often requires a lightweight shipper Kafka/Redis/etc or central log shipper Extra features (e.g. TTL) One place for changes bufferapp bufferapp
  • 37. 3 7 01 Multiple destinations * or flat files, or S3 or... input buffer processing Outputs need to be in sync input Processing may cause backpressure processing *
  • 39. 3 9 01 Just Solr and maybe flat files? Go simple with a local shipper Custom, fast-changing processing & multiple destinations? Kafka as a central buffer Octi’s pipeline preferences
  • 40. 4 0 01 Parsing unstructured data Ideally, log in JSON* Otherwise, parse * or another serialization format For performance and maintenance (i.e. no need to update parsing rules) Regex-based (e.g. grok) Easy to build rules Rules are flexible Typically slow & O(n) on # of rules, but: Move matching patterns to the top of the list Move broad patterns to the bottom Skip patterns including others that didn’t match Grammar-based (e.g. liblognorm, PatternDB) Faster. O(1) on # of rules Numbers in our 2015 session: sematext.com/blog/2015/10/16/large-scale-log-analytics-with-solr/
  • 41. 4 1 01 Decide what gives when buffers fill up input shipper Can drop data here, but better buffer when shipper buffer is full, app can block or drop data Check: Local files: what happens when files are rotated/archived/deleted? UDP: network buffers should handle spiky load* TCP: what happens when connection breaks/times out? UNIX sockets: what happens when socket blocks writes**? * you’d normally increase net.core.rmem_max and rmem_default ** both DGRAM and STREAM local sockets are reliable (vs Internet ones, UDP and TCP)
  • 42. 4 2 01 Octi’s flow chart of where to log critical? UDP. Increase network buffers on destination, so it can handle spiky traffic Paying with RAM or IO? UNIX socket. Local shipper with memory buffers, that can drop data if needed Local files. Make sure rotation is in place or you’ll run out of disk! no IO RAM yes
  • 43. 4 3 01 Protocols UDP: cool for the app, but not reliable TCP: more reliable, but not completely Application-level ACKs may be needed: No failure/backpressure handling needed App gets ACK when OS buffer gets it ⇒ no retransmit if buffer is lost* * more at blog.gerhards.net/2008/05/why-you-cant-build-reliable-tcp.html sender receiver ACKs Protocol Example shippers HTTP Logstash, rsyslog, Fluentd RELP rsyslog, Logstash Beats Filebeat, Logstash Kafka Fluentd, Filebeat, rsyslog, Logstash
  • 44. 4 4 01 Octi’s top pipeline+shipper combos rsyslog app UNIX socket HTTP memory+disk buffer can drop app fileKafka consumer Filebeat HTTP simple & reliable
  • 45. 4 5 01 Conclusions, questions, we’re hiring, thank you The whole talk was pretty much only conclusions :) Feels like there’s much more to discover. Please test & share your own nuggets http://www.relatably.com/m/img/oh-my-god-memes/oh-my-god.jpg Scary word, ha? Poke us: @kucrafal @radu0gheorghe @sematext ...or @ our booth here