SlideShare a Scribd company logo
1 of 22
Download to read offline
Elasticsearch
“war stories”
Well Hello There! I am Arno Broekhof
Data Engineer ( full stack ) @Dataworkz

Working with elasticsearch since 2011

Dutch National Police
History of Elasticsearch
• Created by Shay Banon

• Compass

• Elasticsearch == Compass 3.0

• First release in February 2010

• Abstraction layer on top of Lucene
Present Day
24 Elasticsearch Clusters
441 Nodes
5477 GB Ram Memory
343 TB Used Data
3798 Indices
Zen Discovery
discovery.zen.ping.multicast.enabled: true
• Elasticsearch nodes uses multicast traffic for discovery

• Default setting in ES < 5x
Not a database
• Persistency
• Consistency
• Security
• SELECT * FROM pet WHERE name LIKE 'b%';
• Total amount of data < 512GB
Shard Sizing
“Too Many Shards or the Gazillion Shards Problem”
• 	 A shard is a Lucene index under the covers, which uses file handles, memory, and CPU cycles.	 

• Every search request needs to hit a copy of every shard in the index. That’s fine if every shard is
sitting on a different node, but not if many shards have to compete for the same resources.

• Term statistics, used to calculate relevance, are per shard. Having a small amount of data in many
shards leads to poor relevance.
How many shards?
• 1.000.000 documents

• Index of 256GB

• 6 nodes

• 1 node has 8 cores and 30GB Heap
256GB / ( 80% heap of 1 node ) = +/- 10 shards
curl -XGET http://localhost:9200/_cat/indices
Disable _source field
• The update, update_by_query, and reindex APIs.

• On the fly highlighting.

• The ability to reindex from one Elasticsearch index to another, 

either to change mappings or analysis, 

or to upgrade an index to a new major version.

• The ability to debug queries or aggregations 

by viewing the original document used at index time.

• Potentially in the future, the ability to repair index corruption automatically.
How much indices
“remember that there is no rule that limits
your application to using only a single index.”
Dynamic Mappings
• Not everything needs to be searchable
"avatarLink": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
• Use Explicit Mapping when possible
{
“job” : “Some job description”,
“date”: “1-10-2017”
}
{
“job” : “Some job description”,
“date”: “NO_DATE”
}
Where is my memory?
{
“aggs” : {
“players”: {
“terms”: {
“field”: “players”,
“size”: 10
}
}
},
“aggs”: {
“other”: {
“terms” : {
“field”: “players”,
“size”: 5
}
}
}
}
• The aggregation will return a list of the 

top 10 players and a list of the 

top five supporting players for each top player

• 50 results

• Minimal effort, Maximum memory
Where is my memory?
{
“aggs” : {
“players”: {
“terms”: {
“field”: “players”,
“size”: 10,
“collect_mode”: “breadth_first”
}
}
},
“aggs”: {
“other”: {
“terms” : {
“field”: “players”,
“size”: 5
}
}
}
}
• Use collect mode if possible

• Trims one level at a time

• Minimal change, Maximum performance
Where is my data?
public void insert(final JsonArray jsonArray) {
if (jsonArray.size() == 0) {
return;
}
BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk();
this.setEsRefreshInterval("-1");
jsonArray.forEach(e -> {
String id = e.getAsJsonObject().get("name").toString();
bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(),
configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id));
});
BulkResponse bulkResponse = bulkRequestBuilder.get();
LOGGER.debug("bulk inserted {} items took: {} with failures: {}",
bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures());
}
Where is my data?
public void insert(final JsonArray jsonArray) {
if (jsonArray.size() == 0) {
return;
}
BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk();
this.setEsRefreshInterval("-1");
jsonArray.forEach(e -> {
String id = e.getAsJsonObject().get("name").toString();
bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(),
configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id));
});
BulkResponse bulkResponse = bulkRequestBuilder.get();
LOGGER.debug("bulk inserted {} items took: {} with failures: {}",
bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures());
}
Query or Filter?
Queries —> should be used when performing a full-text search, 

when scoring of results is required (think search results ranked by relevancy).



Filters —> are much faster than queries, mainly because they don’t score the results.  
If you just want to return all of the products that are blue, 

or that cost more than €50, use filters!
_type == _type
• Use unique types

• Why wordpress post_type == _type is a bad idea

• When deleting a post a document is identified both by its _id and _type
Search limits
• Default limits to 10

• Max results limits to 10.000

• If you want everything use the scroll api
We have a distributed search engine, nodes can fail!
• We have shards replica’s

• Single master

• Use dedicated masters
Slow recovery
-XPUT _cluster/settings -d ‘{
"transient" : {
"cluster.routing.allocation.cluster_concurrent_rebalance" : "5",
"cluster.routing.allocation.node_concurrent_recoveries" : "5",
"cluster.routing.allocation.node_initial_primaries_recoveries" : "4",
"indices.recovery.concurrent_streams" : "4",
"indices.recovery.max_bytes_per_sec" : "200mb",
"indices.store.throttle.max_bytes_per_sec" : "100mb"
}
}’
What brings the future?
• Java Transport Client is deprecated, REST is the way to go

• Cross Cluster Searches

• Index sorting during indexing

• Only one type can exist

• Better use of transaction logs

• Sparse Doc Values
Questions?

More Related Content

What's hot

MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingJason Terpko
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best PracticesJason Terpko
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBm_richardson
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialJason Terpko
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDBJason Terpko
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In ElasticsearchKnoldus Inc.
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for HumansCitus Data
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
ELK: a log management framework
ELK: a log management frameworkELK: a log management framework
ELK: a log management frameworkGiovanni Bechis
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External AuthenticationJason Terpko
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraOlga Lavrentieva
 
Advanced data access with Dapper
Advanced data access with DapperAdvanced data access with Dapper
Advanced data access with DapperDavid Paquette
 

What's hot (20)

MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and Merging
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Hazelcast
HazelcastHazelcast
Hazelcast
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
ELK: a log management framework
ELK: a log management frameworkELK: a log management framework
ELK: a log management framework
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External Authentication
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 
Advanced data access with Dapper
Advanced data access with DapperAdvanced data access with Dapper
Advanced data access with Dapper
 

Similar to Elasticsearch War Stories

Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...kristgen
 
Elasticsearch - Zero to Hero
Elasticsearch - Zero to HeroElasticsearch - Zero to Hero
Elasticsearch - Zero to HeroDaniel Ziv
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearchdnoble00
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3uzzal basak
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo dbDaeMyung Kang
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
CouchDB on Android
CouchDB on AndroidCouchDB on Android
CouchDB on AndroidSven Haiges
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchKeira Zhou
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hacksteindistributed matters
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Oliver Gierke
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Compass Framework
Compass FrameworkCompass Framework
Compass FrameworkLukas Vlcek
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020Thodoris Bais
 

Similar to Elasticsearch War Stories (20)

Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
 
Elasticsearch - Zero to Hero
Elasticsearch - Zero to HeroElasticsearch - Zero to Hero
Elasticsearch - Zero to Hero
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo db
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
CouchDB on Android
CouchDB on AndroidCouchDB on Android
CouchDB on Android
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & Elasticsearch
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hackstein
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 

Recently uploaded

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 

Recently uploaded (20)

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 

Elasticsearch War Stories

  • 2. Well Hello There! I am Arno Broekhof Data Engineer ( full stack ) @Dataworkz Working with elasticsearch since 2011 Dutch National Police
  • 3. History of Elasticsearch • Created by Shay Banon • Compass • Elasticsearch == Compass 3.0 • First release in February 2010 • Abstraction layer on top of Lucene
  • 4. Present Day 24 Elasticsearch Clusters 441 Nodes 5477 GB Ram Memory 343 TB Used Data 3798 Indices
  • 5. Zen Discovery discovery.zen.ping.multicast.enabled: true • Elasticsearch nodes uses multicast traffic for discovery • Default setting in ES < 5x
  • 6. Not a database • Persistency • Consistency • Security • SELECT * FROM pet WHERE name LIKE 'b%'; • Total amount of data < 512GB
  • 7. Shard Sizing “Too Many Shards or the Gazillion Shards Problem” • A shard is a Lucene index under the covers, which uses file handles, memory, and CPU cycles. • Every search request needs to hit a copy of every shard in the index. That’s fine if every shard is sitting on a different node, but not if many shards have to compete for the same resources. • Term statistics, used to calculate relevance, are per shard. Having a small amount of data in many shards leads to poor relevance.
  • 8. How many shards? • 1.000.000 documents • Index of 256GB • 6 nodes • 1 node has 8 cores and 30GB Heap 256GB / ( 80% heap of 1 node ) = +/- 10 shards curl -XGET http://localhost:9200/_cat/indices
  • 9. Disable _source field • The update, update_by_query, and reindex APIs. • On the fly highlighting. • The ability to reindex from one Elasticsearch index to another, 
 either to change mappings or analysis, 
 or to upgrade an index to a new major version. • The ability to debug queries or aggregations 
 by viewing the original document used at index time. • Potentially in the future, the ability to repair index corruption automatically.
  • 10. How much indices “remember that there is no rule that limits your application to using only a single index.”
  • 11. Dynamic Mappings • Not everything needs to be searchable "avatarLink": { "type": "string", "index": "not_analyzed", "doc_values": true }, • Use Explicit Mapping when possible { “job” : “Some job description”, “date”: “1-10-2017” } { “job” : “Some job description”, “date”: “NO_DATE” }
  • 12. Where is my memory? { “aggs” : { “players”: { “terms”: { “field”: “players”, “size”: 10 } } }, “aggs”: { “other”: { “terms” : { “field”: “players”, “size”: 5 } } } } • The aggregation will return a list of the 
 top 10 players and a list of the 
 top five supporting players for each top player • 50 results • Minimal effort, Maximum memory
  • 13. Where is my memory? { “aggs” : { “players”: { “terms”: { “field”: “players”, “size”: 10, “collect_mode”: “breadth_first” } } }, “aggs”: { “other”: { “terms” : { “field”: “players”, “size”: 5 } } } } • Use collect mode if possible • Trims one level at a time • Minimal change, Maximum performance
  • 14. Where is my data? public void insert(final JsonArray jsonArray) { if (jsonArray.size() == 0) { return; } BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk(); this.setEsRefreshInterval("-1"); jsonArray.forEach(e -> { String id = e.getAsJsonObject().get("name").toString(); bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(), configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id)); }); BulkResponse bulkResponse = bulkRequestBuilder.get(); LOGGER.debug("bulk inserted {} items took: {} with failures: {}", bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures()); }
  • 15. Where is my data? public void insert(final JsonArray jsonArray) { if (jsonArray.size() == 0) { return; } BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk(); this.setEsRefreshInterval("-1"); jsonArray.forEach(e -> { String id = e.getAsJsonObject().get("name").toString(); bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(), configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id)); }); BulkResponse bulkResponse = bulkRequestBuilder.get(); LOGGER.debug("bulk inserted {} items took: {} with failures: {}", bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures()); }
  • 16. Query or Filter? Queries —> should be used when performing a full-text search, 
 when scoring of results is required (think search results ranked by relevancy).
 
 Filters —> are much faster than queries, mainly because they don’t score the results.   If you just want to return all of the products that are blue, 
 or that cost more than €50, use filters!
  • 17. _type == _type • Use unique types • Why wordpress post_type == _type is a bad idea • When deleting a post a document is identified both by its _id and _type
  • 18. Search limits • Default limits to 10 • Max results limits to 10.000 • If you want everything use the scroll api
  • 19. We have a distributed search engine, nodes can fail! • We have shards replica’s • Single master • Use dedicated masters
  • 20. Slow recovery -XPUT _cluster/settings -d ‘{ "transient" : { "cluster.routing.allocation.cluster_concurrent_rebalance" : "5", "cluster.routing.allocation.node_concurrent_recoveries" : "5", "cluster.routing.allocation.node_initial_primaries_recoveries" : "4", "indices.recovery.concurrent_streams" : "4", "indices.recovery.max_bytes_per_sec" : "200mb", "indices.store.throttle.max_bytes_per_sec" : "100mb" } }’
  • 21. What brings the future? • Java Transport Client is deprecated, REST is the way to go • Cross Cluster Searches • Index sorting during indexing • Only one type can exist • Better use of transaction logs • Sparse Doc Values