SlideShare a Scribd company logo
1 of 13
Download to read offline
ElasticSearch 101
Setting up, configuring and tuning your ElasticSearch
cluster
Our ElasticSearch setup
Client
node
Client
node
Data
node
Data
node
Data
node
Apps
● 8 cores, 30GB
RAM, 2TB EBS
● Running in Docker
● Apache Mesos /
Marathon
● Dedicated DN
machines
https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0
https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1195474
It’s not a database
● You don’t get the same guarantees as
from databases (ACID)
● Writes acknowledged before flushed to
persistent storage
● Network partitions can lead to data loss:
- Long GC pauses
- Kernel bugs (!)
● Deletes take longer
Rolling your own ES Cluster (1)
● Name your cluster
● Disable multicast for discovery
● Set minimum master nodes (N/2+1) => Split-Brain!
● Check open file descriptors limit
● Disable swap (or mlockall)
● Configure gateway settings
○ recover_after_time
○ expected_nodes
● Avoid tribe nodes
Rolling your own ES Cluster (2)
Exhausting available JVM heap mem
Nodes will become
unresponsive!
Memory requirements
● Bottom peaks of the used
JVM heap after the GC
run mark the required
memory (add safety
buffer)
● At least 4GB per node
● 50% for JVM, 50% for FS
cache / Lucene
JVM settings
● Define heap memory (ES_HEAP_SIZE)
● Don’t tune JVM settings
● Don’t tune thread pool
■ In some case you might have to
■ Increasing will introduce memory pressure
● Don’t use G1 garbage collector
Indexing data
● Define data schemas and types ≠ Schemaless
○ Default: string mapping = analyzed = memory costly
○ Understand tokenizers and analyzers
● Prefer bulk indexing
● Refresh interval
● Time based indexes for log data
Querying for data
● Use filters as much as possible
● `Scan & scroll` for dumping large data, e.g. when
reindexing
● Transform data during indexing if possible
● ORMs make debugging a pain.
https://www.found.no/foundation/optimizing-elasticsearch-searches/
https://abhishek376.wordpress.com/2014/11/24/how-we-optimized-100-sec-elasticsearch-queries-to-be-under-a-sub-second/
Avoid high cardinality fields
● Aggregation => field data
● Often major consumer of heap
memory
● Use doc values (on disk field data)
● Avoid aggregation on analyzed
fields
More things to watch out
● Cluster health (duh!)
● Field data cache size
● Filter cache eviction
● Slow queries
● GC pauses
● Security settings
○ no authentication by default
● Backup
Tooling
● Use official SDKs
● For Go we use ElastiGo (not so great)
● Elastic HQ
● Inquisitor
● Sense

More Related Content

What's hot

Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
macrochen
 

What's hot (19)

Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 

Viewers also liked

Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
MongoSF
 

Viewers also liked (20)

Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
 
Elasticsearch cluster deep dive
Elasticsearch  cluster deep diveElasticsearch  cluster deep dive
Elasticsearch cluster deep dive
 
Elastic Search Meetup Special - Yann Cluchey, Cogenta
Elastic Search Meetup Special - Yann Cluchey, Cogenta Elastic Search Meetup Special - Yann Cluchey, Cogenta
Elastic Search Meetup Special - Yann Cluchey, Cogenta
 
Elasticsearch - OSDC France 2012
Elasticsearch - OSDC France 2012Elasticsearch - OSDC France 2012
Elasticsearch - OSDC France 2012
 
Getting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DBGetting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DB
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
The Five Stages of Chef Grief: My First 6 months with Chef, and Getting Aroun...
The Five Stages of Chef Grief: My First 6 months with Chef, and Getting Aroun...The Five Stages of Chef Grief: My First 6 months with Chef, and Getting Aroun...
The Five Stages of Chef Grief: My First 6 months with Chef, and Getting Aroun...
 
2016 - IGNITE - An ElasticSearch Cluster Named George Armstrong Custer
2016 - IGNITE - An ElasticSearch Cluster Named George Armstrong Custer2016 - IGNITE - An ElasticSearch Cluster Named George Armstrong Custer
2016 - IGNITE - An ElasticSearch Cluster Named George Armstrong Custer
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
 
Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
 
An Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for BeginnersAn Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for Beginners
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
Apache ignite Datagrid
Apache ignite DatagridApache ignite Datagrid
Apache ignite Datagrid
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 

Similar to Elasticsearch 101 - Cluster setup and tuning

Storage talk
Storage talkStorage talk
Storage talk
christkv
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
Alaa Elhadba
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
Baruch Osoveskiy
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
MongoDB
 

Similar to Elasticsearch 101 - Cluster setup and tuning (20)

Caching in
Caching inCaching in
Caching in
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
Storage talk
Storage talkStorage talk
Storage talk
 
WiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-TreeWiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-Tree
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the Server
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Investigate SQL Server Memory Like Sherlock Holmes
Investigate SQL Server Memory Like Sherlock HolmesInvestigate SQL Server Memory Like Sherlock Holmes
Investigate SQL Server Memory Like Sherlock Holmes
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Elasticsearch 101 - Cluster setup and tuning

  • 1. ElasticSearch 101 Setting up, configuring and tuning your ElasticSearch cluster
  • 2. Our ElasticSearch setup Client node Client node Data node Data node Data node Apps ● 8 cores, 30GB RAM, 2TB EBS ● Running in Docker ● Apache Mesos / Marathon ● Dedicated DN machines
  • 3. https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0 https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1195474 It’s not a database ● You don’t get the same guarantees as from databases (ACID) ● Writes acknowledged before flushed to persistent storage ● Network partitions can lead to data loss: - Long GC pauses - Kernel bugs (!) ● Deletes take longer
  • 4. Rolling your own ES Cluster (1) ● Name your cluster ● Disable multicast for discovery ● Set minimum master nodes (N/2+1) => Split-Brain!
  • 5. ● Check open file descriptors limit ● Disable swap (or mlockall) ● Configure gateway settings ○ recover_after_time ○ expected_nodes ● Avoid tribe nodes Rolling your own ES Cluster (2)
  • 6. Exhausting available JVM heap mem Nodes will become unresponsive!
  • 7. Memory requirements ● Bottom peaks of the used JVM heap after the GC run mark the required memory (add safety buffer) ● At least 4GB per node ● 50% for JVM, 50% for FS cache / Lucene
  • 8. JVM settings ● Define heap memory (ES_HEAP_SIZE) ● Don’t tune JVM settings ● Don’t tune thread pool ■ In some case you might have to ■ Increasing will introduce memory pressure ● Don’t use G1 garbage collector
  • 9. Indexing data ● Define data schemas and types ≠ Schemaless ○ Default: string mapping = analyzed = memory costly ○ Understand tokenizers and analyzers ● Prefer bulk indexing ● Refresh interval ● Time based indexes for log data
  • 10. Querying for data ● Use filters as much as possible ● `Scan & scroll` for dumping large data, e.g. when reindexing ● Transform data during indexing if possible ● ORMs make debugging a pain. https://www.found.no/foundation/optimizing-elasticsearch-searches/ https://abhishek376.wordpress.com/2014/11/24/how-we-optimized-100-sec-elasticsearch-queries-to-be-under-a-sub-second/
  • 11. Avoid high cardinality fields ● Aggregation => field data ● Often major consumer of heap memory ● Use doc values (on disk field data) ● Avoid aggregation on analyzed fields
  • 12. More things to watch out ● Cluster health (duh!) ● Field data cache size ● Filter cache eviction ● Slow queries ● GC pauses ● Security settings ○ no authentication by default ● Backup
  • 13. Tooling ● Use official SDKs ● For Go we use ElastiGo (not so great) ● Elastic HQ ● Inquisitor ● Sense