The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
So, what is the ELK Stack? "ELK" is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch.
A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementEl Mahdi Benzekri
Initiation to the powerful Elasticsearch Logstash and Kibana stack, it has many use cases, the popular one is the server and application log management.
ELK Stack workshop covers real-world use cases and works with the participants to - implement them. This includes Elastic overview, Logstash configuration, creation of dashboards in Kibana, guidelines and tips on processing custom log formats, designing a system to scale, choosing hardware, and managing the lifecycle of your logs.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
So, what is the ELK Stack? "ELK" is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch.
A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementEl Mahdi Benzekri
Initiation to the powerful Elasticsearch Logstash and Kibana stack, it has many use cases, the popular one is the server and application log management.
ELK Stack workshop covers real-world use cases and works with the participants to - implement them. This includes Elastic overview, Logstash configuration, creation of dashboards in Kibana, guidelines and tips on processing custom log formats, designing a system to scale, choosing hardware, and managing the lifecycle of your logs.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
At Netflix, the big data platform is the foundation for analytics that drive all product decisions that directly impact our customer experience. As for scale, it is one of the top three largest services running at Netflix, in terms of compute power and data size.
In this talk, we will take the audience through a journey to understand how we scale the platform to handle the increasing amount of data (over 500 billion events generated daily), the increasing demand of analytics (which translates to compute power), and the increasing number of users dependent on our platform to make business decisions.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
A presentation given at the Lucene/Solr Revolution 2014 conference to show Solr and Elasticsearch features side by side. The presentation time was only 30 minutes, so only the core usability features were compared. The full video is embedded on the last slide.
Elasticsearch is quite common tool nowadays. Usually as a part of ELK stack, but in some cases to support main feature of the system as search engine. Documentation on regular use cases and on usage in general is pretty good, but how it really works, how it behaves beneath the surface of the API? This talk is about that, we will look under the hood of Elasticsearch and dive deep in the largely unknown implementation details. Talk covers cluster behaviour, communication with Lucene and Lucene internals to literally bits and pieces. Come and see Elasticsearch dissected.
This slide deck talks about Elasticsearch and its features.
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.
what is the ELK Stack?
ELK vs Elastic stack
What is Elasticsearch used for?
How does Elasticsearch work?
What is an Elasticsearch index?
Shards
Replicas
Nodes
Clusters
What programming languages does Elasticsearch support?
Amazon Elasticsearch, its use cases and benefits
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Databricks
Join this session to hear why Smartsheet decided to transition from their entirely SQL-based system to Snowflake and Databricks, and learn how that transition has made an immediate impact on their team, company and customer experience through enabling faster, informed data decisions.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
After this tutorial you will be equipped with the knowledge necessary to make informed decisions about the design and setup of a cloud-based Big Data platform using the so-called Modern Data Stack (MDS). Bogdan will introduce the main concepts of the MDS and explain its benefits and why it is such a compelling choice for companies of any size. He will also provide advice to help you navigate the ever increasing number of vendors in this space. You will learn what you need to take into consideration when implementing the MDS in your company, and see how it works in a live demo.
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
At Netflix, the big data platform is the foundation for analytics that drive all product decisions that directly impact our customer experience. As for scale, it is one of the top three largest services running at Netflix, in terms of compute power and data size.
In this talk, we will take the audience through a journey to understand how we scale the platform to handle the increasing amount of data (over 500 billion events generated daily), the increasing demand of analytics (which translates to compute power), and the increasing number of users dependent on our platform to make business decisions.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
A presentation given at the Lucene/Solr Revolution 2014 conference to show Solr and Elasticsearch features side by side. The presentation time was only 30 minutes, so only the core usability features were compared. The full video is embedded on the last slide.
Elasticsearch is quite common tool nowadays. Usually as a part of ELK stack, but in some cases to support main feature of the system as search engine. Documentation on regular use cases and on usage in general is pretty good, but how it really works, how it behaves beneath the surface of the API? This talk is about that, we will look under the hood of Elasticsearch and dive deep in the largely unknown implementation details. Talk covers cluster behaviour, communication with Lucene and Lucene internals to literally bits and pieces. Come and see Elasticsearch dissected.
This slide deck talks about Elasticsearch and its features.
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.
what is the ELK Stack?
ELK vs Elastic stack
What is Elasticsearch used for?
How does Elasticsearch work?
What is an Elasticsearch index?
Shards
Replicas
Nodes
Clusters
What programming languages does Elasticsearch support?
Amazon Elasticsearch, its use cases and benefits
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Databricks
Join this session to hear why Smartsheet decided to transition from their entirely SQL-based system to Snowflake and Databricks, and learn how that transition has made an immediate impact on their team, company and customer experience through enabling faster, informed data decisions.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
After this tutorial you will be equipped with the knowledge necessary to make informed decisions about the design and setup of a cloud-based Big Data platform using the so-called Modern Data Stack (MDS). Bogdan will introduce the main concepts of the MDS and explain its benefits and why it is such a compelling choice for companies of any size. He will also provide advice to help you navigate the ever increasing number of vendors in this space. You will learn what you need to take into consideration when implementing the MDS in your company, and see how it works in a live demo.
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefGaurav "GP" Pal
Large scale data processing for Extract Transform and Loading (ETL) jobs is a very common practice. The stackArmor DevOps team developed a Chef based automation solution to automate the AWS environment provisioning, code deployment and data ingestion processing to ingest and process over 2 TB of Data.
This presentation covers the technologies used, the planning phase, AWS instance selection and optimizing the ETL processing for not only performance but also cost.
The target was to process 500 million rows within 72 hours with a processing rate of 5 million transactions per hour.
The presentation also provides pitfalls and automation optimizations performed to accomplish the targeted processing rates.
The presentation was delivered at the DevOpsDC Meetup on May 17, 2016
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
Comparing the burst buffers of today, such as the Cray DataWarp-based burst buffer implemented on NERSC Cori, to the proto-burst buffer deployed on SDSC's Gordon supercomputer in 2012.
Scaling with sync_replication using Galera and EC2Marco Tusa
Challenging architecture design, and proof of concept on a real case of study using Syncrhomous solution.
Customer asks me to investigate and design MySQL architecture to support his application serving shops around the globe.
Scale out and scale in base to sales seasons.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
So you are deployed to production (or soon to be) with Elasticsearch running and powering important application features. Or maybe used for centralized logging for effective debugging.
Was your Elastic cluster deployed correctly? Is it stable? Can it hold the throughput you expect it to?
How did you do capacity planning? How to tell if the cluster is healthy and what to monitor? How to apply effective multi-tenancy? and what would be an ideal cluster topology and data ingestion architecture?
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day.
Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Warning: This presentation will probably contain some references to late 2000's pop group LMFAO
About the Speaker
Ben Bromhead CTO, Instaclustr
Ben Bromhead is the CTO of Instaclustr where he is responsible for working closely with his engineering team and customers to build highly available, scalable applications on top of Cassandra. Instaclustr is the only multi-cloud, self service Cassandra as a Service provider in the world and is dedicated to provider world class support.
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
The talk I gave at the Snow Unix Event in Nederland about upgrading a massive production Elasticsearch cluster from a major version to another without downtime and a complete rollback plan.
A talk I gave at Aircall about scaling engineering teams in a context of fast-growing startups before joining the company as an interim VP Engineering.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
1. RUNNING & SCALING
LARGE ELASTICSEARCH
CLUSTERS
FRED DE VILLAMIL, DIRECTOR OF
INFRASTRUCTURE
@FDEVILLAMIL
OCTOBER 2017
2. BACKGROUND
• FRED DE VILLAMIL, 39 ANS, TEAM
COFFEE @SYNTHESIO,
• LINUX / (FREE)BSD USER SINCE 1996,
• OPEN SOURCE CONTRIBUTOR SINCE
1998,
• LOVES TENNIS, PHOTOGRAPHY, CUTE
OTTERS, INAPPROPRIATE HUMOR AND
ELASTICSEARCH CLUSTERS OF UNUSUAL
SIZE.
WRITES ABOUT ES AT
HTTPS://THOUGHTS.T37.NET
3. ELASTICSEARCH @SYNTHESIO
• 8 production clusters, 600 hosts, 1.7PB storage, 37.5TB
RAM, average 15k writes / s, 800 search /s, some inputs >
200MB.
• Data nodes: 6 core Xeon E5v3, 64GB RAM, 4*800GB SSD
RAID0. Sometimes bi Xeon E5-2687Wv4 12 core (160
watts!!!).
• We agregate data from various cold storage and make them
searchable in a giffy.
6. NEVER GONNA GIVE YOU UP
• NEVER GONNA LET YOU DOWN,
• NEVER GONNA RUN AROUND AND
DESERT YOU,
• NEVER GONNA MAKE YOU CRY,
• NEVER GONNA SAY GOODBYE,
• NEVER GONNA TELL A LIE & HURT YOU.
7. AVOIDING DOWNTIME & SPLIT BRAINS
• RUN AT LEAST 3 MASTER NODES INTO 3 DIFFERENT
LOCATIONS.
• NEVER RUN BULK QUERIES ON THE MASTER
NODES.
• ACTUALLY NEVER RUN ANYTHING BUT
ADMINISTRATIVE TASKS ON THE MASTER NODES.
• SPREAD YOUR DATA INTO 2 DIFFERENT LOCATION
WITH AT LEAST A REPLICATION FACTOR OF 1 (1
PRIMARY, 1 REPLICA).
9. RACK AWARENESS + QUERY NODES ==
MAGIC
ES PRIVILEGES THE
DATA NODES WITH
THE SAME RACK_ID
AS THE QUERY.
REDUCES LATENCY
AND BALANCES THE
LOAD.
10. RACK AWARENESS + QUERY NODES + ZONE ==
MAGIC + FUN
ADD ZONES INTO THE
SAME RACK FOR
EVEN REPLICATION
WITH HIGHER
FACTOR.
11. USING ZONES FOR FUN & PROFIT
ALLOWING EVEN REPLICATION
WITH A HIGHER FACTOR WITHIN
THE SAME RACK.
ALLOWING MORE RESOURCES
TO THE MOST FREQUENTLY
ACCESSED INDEXES.
…
13. HOW ELASTICSEARCH USES THE MEMORY
• Starts with allocating memory for Java heap.
• The Java heap contains all Elasticsearch buffers
and caches + a few other things.
• Each Java thread maps a system thread: +128kB
off heap.
• Elected master uses 250kB to store each shard
information inside the cluster.
14. ALLOCATING MEMORY
• Never allocate more than 31GB
heap to avoid the compressed
pointers issue.
• Use 1/2 of your memory up to
31GB.
• Feed your master and query
nodes, the more the better
(including CPU).
15. MEMORY LOCK
• Use memory_lock: true at
startup.
• Requires ulimit -l
unlimited.
• Allocates the whole heap at once.
• Uses contiguous memory regions.
• Avoids swapping (you should
disable swap anyway).
16. CHOSING THE RIGHT GARBAGE COLLECTOR
• ES runs with CMS as a default
garbage collector.
• CMS was designed for heaps <
4GB.
• Stop the world garbage
collection last too long & blocks
the cluster.
• Solution: switching to G1GC
(default in Java9, unsupported).
17. CMS VS G1GC
• CMS: SHARED CPU TIME WITH THE APPLICATION.
“STOPS THE WORLD” WHEN TOO MANY MEMORY TO
CLEAN UNTIL IT SENDS AN OUTOFMEMORYERROR.
• G1GC: SHORT, MORE FREQUENT, PAUSES. WON’T
STOP A NODE UNTIL IT LEAVES THE CLUSTER.
• ELASTIC SAYS: DON’T USE G1GC FOR REASONS,
SO READ THE DOC.
18. G1GC OPTIONS
+USEG1GC: ACTIVATES G1GC
MAXGCPAUSEMILLIS: TARGET FOR MAX GC PAUSE
TIME.
GCPAUSEINTERVALMILLIS:TARGET FOR COLLECTION
TIME SPACE
INITIATINGHEAPOCCUPANCYPERCENT: WHEN TO START
COLLECTING?
19. CHO0SING THE RIGHT STORAGE
• MMAPFS : MAPS LUCENE FILES ON
THE VIRTUAL MEMORY USING
MMAP. NEEDS AS MUCH MEMORY
AS THE FILE BEING MAPPED TO
AVOID ISSUES.
• NIOFS : APPLIES A SHARED LOCK
ON LUCENE FILES AND RELIES
ON VFS CACHE.
20. BUFFERS AND CACHES
• ELASTICSEARCH HAS MULTIPLE CACHES & BUFFERS, WITH DEFAULT VALUES,
KNOW THEM!
• BUFFERS + CACHE MUST BE < TOTAL JAVA HEAP (OBVIOUS BUT…).
• AUTOMATED EVICTION ON THE CACHE, BUT FORCING IT CAN SAVE YOUR LIFE
WITH A SMALL OVERHEAD.
• IF YOU HAVE OOM ISSUES, DISABLE THE CACHES!
• FROM A USER POV, CIRCUIT BREAKERS ARE A NO GO!
22. INDEX DESIGN
• VERSION YOUR INDEX BY MAPPING: 1_*, 2_* ETC.
• THE MORE SHARDS, THE BETTER ELASTICITY, BUT
THE MORE CPU AND MEMORY USED ON THE
MASTERS.
• PROVISIONNING 10GB PER SHARDS ALLOWS A
FASTER RECOVERY & REALLOCATION.
23. REPLICATION TRICKS
• NUMBER OF REPLICAS MUST BE 0 OR ODD.
CONSISTENCY QUORUM: INT( (PRIMARY +
NUMBER_OF_REPLICAS) / 2 ) + 1.
• RAISE THE REPLICATION FACTOR TO SCALE FOR
READING
UP TO 100% OF THE DATA / DATA NODE.
24. ALIASES
• ACCESS MULTIPLE INDICES AT ONCE.
• READ MULTIPLE, WRITE ONLY ONE.
EXAMPLE ON TIMESTAMPED INDICES:
"18_20171020": { "aliases": { "2017": {}, "201710": {}, "20171020": {} } }
"18_20171021": { "aliases": { "2017": {}, "201710": {}, "20171021": {} } }
Queries:
/2017/_search
/201710/_search
AFTER A MAPPING CHANGE & REINDEX, CHANGE THE ALIAS:
25. ROLLOVER
• CREATE A NEW INDEX WHEN TOO OLD OR
TOO BIG.
• SUPPORT DATE MATH: DAILY INDEX
CREATION.
• USE ALIASES TO QUERY ALL
ROLLOVERED INDEXES.
PUT "logs-000001" { "aliases": { "logs": {} } }
POST /logs/_rollover { "conditions": { "max_docs": 10000000 } }
27. CONFIGURATION CHANGES
• PREFER CONFIGURATION FILE UPDATES TO API CALL FOR
PERMANENT CHANGES.
• VERSION YOUR CONFIGURATION CHANGES SO YOU CAN
ROLLBACK, ES REQUIRES LOTS OF FINE TUNING.
• WHEN USING _SETTINGS API, PREFER TRANSIENT TO
PERSISTENT, THEY’RE EASIER TO GET RID OF.
28. RECONFIGURING THE WHOLE CLUSTER
LOCK SHARD REALLOCATION & RECOVERY:
"cluster.routing.allocation.enable" : "none"
OPTIMIZE FOR RECOVERY:
"cluster.routing.allocation.node_initial_primaries_recoveries": 50
"indices.recovery.max_bytes_per_sec": "2048mb"
RESTART A FULL RACK, WAIT FOR NODES TO COME
29. THE REINDEX API
• IN CLUSTER AND CLUSTER TO CLUSTER REINDEX API.
• ALLOWS CROSS VERSION INDEXING: 1.7 TO 5.1…
• SLICED SCROLLS ONLY AVAILABLE STARTING 6.0.
• ACCEPT ES QUERIES TO FILTER THE DATA TO REINDEX.
• MERGE MULTIPLE INDEXES INTO 1.
30. BULK INDEXING TRICKS
LIMIT REBALANCE:
"cluster.routing.allocation.cluster_concurrent_rebalance": 1
"cluster.routing.allocation.balance.shard": "0.15f"
"cluster.routing.allocation.balance.threshold": "10.0f"
DISABLE REFRESH:
"index.refresh_interval:" "0"
NO REPLICA:
"index.number_of_replicas:" "0" // having replica index n times in Lucene, adding one just "rsync" the data.
ALLOCATE ON DEDICATED HARDWARE:
31. OPTIMIZING FOR SPACE & PERFORMANCES
• LUCENE SEGMENTS ARE IMMUTABLE, THE MORE YOU
WRITE, THE MORE SEGMENTS YOU GET.
• DELETING DOCUMENTS DOES COPY ON WRITE SO NO
REAL DELETE.
index.merge.scheduler.max_thread_count: default CPU/2 with min 4
POST /_force_merge?only_expunge_deletes: faster, only merge segments with deleted
POST /_force_merge?max_num_segments: don’t use on indexes you write on!
WARNING: _FORCE_MERGE HAS A COST IN CPU AND I/OS.
32. MINOR VERSION UPGRADES
• CHECK YOUR PLUGINS COMPATIBILITY, PLUGINS
MUST BE COMPILED FOR YOUR MINOR VERSION.
• START UPGRADING THE MASTER NODES.
• UPGRADE THE DATA NODES ON A WHOLE RACK AT
ONCE.
33. OS LEVEL UPGRADES
• ENSURE THE WHOLE CLUSTER RUNS THE SAME JAVA
VERSION.
• WHEN UPGRADING JAVA, CHECK IF YOU DON’T HAVE
TO UPGRADE THE KERNEL.
• PER NODE JAVA / KERNEL VERSION AVAILABLE IN THE
_STATS API.
35. CAPTAIN OBVIOUS, YOU’RE MY ONLY HOPE!
• GOOD MONITORING IS BUSINESS ORIENTED MONITORING.
• GOOD ALERTING IS ACTIONABLE ALERTING.
• DON’T MONITORE THE CLUSTER ONLY, BUT THE WHOLE PROCESSING
CHAIN.
• USELESS METRICS ARE USELESS.
• LOSING A DATACENTER: OK. LOSING DATA: NOT OK!
37. LIFE, DEATH & _CLUSTER/HEALTH
• A RED CLUSTER MEANS AT LEAST 1
INDEX HAS MISSING DATA. DON’T
PANIC!
• USING LEVEL={INDEX,SHARD} AND AN
INDEX ID PROVIDES SPECIFIC
INFORMATION.
• LOTS OF PENDING TASKS MEANS YOUR
CLUSTER IS UNDER HEAVY LOAD AND
SOME NODES CAN’T PROCESS THEM
FAST ENOUGH.
• LONG WAITING TASKS MEANS YOU
HAVE A CAPACITY PLANNING PROBLEM.
38. USE THE _CAT API
• PROVIDES GENERAL INFORMATION
ABOUT YOUR NODES, SHARDS,
INDICES AND THREAD POOLS.
• HIT THE WHOLE CLUSTER, WHEN
IT TIMEOUTS YOU’VE PROBABLY
HAVING A NODE STUCK IN
GARBAGE COLLECTION.
39. MONITORING AT THE CLUSTER LEVEL
• USE THE _STATS API FOR PRECISE INFORMATION.
• MONITORE THE SHARDS REALLOCATION, TOO MANY
MEANS A DESIGN PROBLEM.
• MONITORE THE WRITES AND CLUSTER WIDE, IF THEY
FALL TO 0 AND IT’S UNUSUAL, A NODE IS STUCK IN
GC.
40. MONITORING AT THE NODE LEVEL
• USE THE _NODES/{NODE}, _NODES/{NODE}/STATS
AND _CAT/THREAD_POOL API.
• THE GARBAGE COLLECTION DURATION &
FREQUENCY IS A GOOD METRIC OF YOUR NODE
HEALTH.
• CACHE AND BUFFERS ARE MONITORED ON A NODE
LEVEL.
• MONITORING I/OS, SPACE, OPEN FILES & CPU IS
CRITICAL.
41. MONITORING AT THE INDEX LEVEL
• USE THE {INDEX}/_STATS API.
• MONITORE THE DOCUMENTS / SHARD RATIO.
• MONITORE THE MERGES, QUERY TIME.
• TOO MANY EVICTIONS MEANS YOU HAVE A CACHE
CONFIGURATION LEVEL.
43. WHAT’S REALLY GOING ON IN YOUR CLUSTER?
• THE _NODES/{NODE}/HOT_THREADS API TELLS WHAT HAPPENS ON
THE HOST.
• THE ELECTED MASTER NODES TELLS YOU MOST THING YOU NEED
TO KNOW.
• ENABLE THE SLOW LOGS TO UNDERSTAND YOUR BOTTLENECK &
OPTIMIZE THE QUERIES. DISABLE THE SLOW LOGS WHEN YOU’RE
DONE!!!
• WHEN NOT ENOUGH, MEMORY PROFILING IS YOUR FRIEND.
44. MEMORY PROFILING
• LIVE MEMORY OR HPROF FILE AFTER A CRASH.
• ALLOWS YOU TO TO KNOW WHAT IS / WAS IN YOUR
BUFFERS AND CACHES.
• YOURKIT JAVA PROFILER AS A TOOL.
45. TRACING
• KNOW WHAT’S REALLY HAPPENING IN YOUR JVM.
• LINUX 4.X PROVIDES GREAT PERF TOOLS, LINUX 4.9 EVEN
BETTER:
• LINUX-PERF,
• JAVA PERF MAP.
• VECTOR BY NETFLIX (NOT VEKTOR THE TRASH METAL
BAND).