SlideShare a Scribd company logo
RUNNING & SCALING
LARGE ELASTICSEARCH
CLUSTERS
FRED DE VILLAMIL, DIRECTOR OF
INFRASTRUCTURE
@FDEVILLAMIL
OCTOBER 2017
BACKGROUND
• FRED DE VILLAMIL, 39 ANS, TEAM
COFFEE @SYNTHESIO,
• LINUX / (FREE)BSD USER SINCE 1996,
• OPEN SOURCE CONTRIBUTOR SINCE
1998,
• LOVES TENNIS, PHOTOGRAPHY, CUTE
OTTERS, INAPPROPRIATE HUMOR AND
ELASTICSEARCH CLUSTERS OF UNUSUAL
SIZE.
WRITES ABOUT ES AT
HTTPS://THOUGHTS.T37.NET
ELASTICSEARCH @SYNTHESIO
• 8 production clusters, 600 hosts, 1.7PB storage, 37.5TB
RAM, average 15k writes / s, 800 search /s, some inputs >
200MB.
• Data nodes: 6 core Xeon E5v3, 64GB RAM, 4*800GB SSD
RAID0. Sometimes bi Xeon E5-2687Wv4 12 core (160
watts!!!).
• We agregate data from various cold storage and make them
searchable in a giffy.
AN ELASTICSEARCH CLUSTER OF UNUSUAL
SIZE
ENSURING HIGH
AVAILABILITY
NEVER GONNA GIVE YOU UP
• NEVER GONNA LET YOU DOWN,
• NEVER GONNA RUN AROUND AND
DESERT YOU,
• NEVER GONNA MAKE YOU CRY,
• NEVER GONNA SAY GOODBYE,
• NEVER GONNA TELL A LIE & HURT YOU.
AVOIDING DOWNTIME & SPLIT BRAINS
• RUN AT LEAST 3 MASTER NODES INTO 3 DIFFERENT
LOCATIONS.
• NEVER RUN BULK QUERIES ON THE MASTER
NODES.
• ACTUALLY NEVER RUN ANYTHING BUT
ADMINISTRATIVE TASKS ON THE MASTER NODES.
• SPREAD YOUR DATA INTO 2 DIFFERENT LOCATION
WITH AT LEAST A REPLICATION FACTOR OF 1 (1
PRIMARY, 1 REPLICA).
RACK AWARENESS
ALLOCATE A
RACK_ID TO THE
DATA NODES FOR
EVEN REPLICATION.
RESTART A WHOLE
DATA CENTER
@ONCE WITHOUT
DOWNTIME.
RACK AWARENESS + QUERY NODES ==
MAGIC
ES PRIVILEGES THE
DATA NODES WITH
THE SAME RACK_ID
AS THE QUERY.
REDUCES LATENCY
AND BALANCES THE
LOAD.
RACK AWARENESS + QUERY NODES + ZONE ==
MAGIC + FUN
ADD ZONES INTO THE
SAME RACK FOR
EVEN REPLICATION
WITH HIGHER
FACTOR.
USING ZONES FOR FUN & PROFIT
ALLOWING EVEN REPLICATION
WITH A HIGHER FACTOR WITHIN
THE SAME RACK.
ALLOWING MORE RESOURCES
TO THE MOST FREQUENTLY
ACCESSED INDEXES.
…
AVOIDING MEMORY
NIGHTMARE
HOW ELASTICSEARCH USES THE MEMORY
• Starts with allocating memory for Java heap.
• The Java heap contains all Elasticsearch buffers
and caches + a few other things.
• Each Java thread maps a system thread: +128kB
off heap.
• Elected master uses 250kB to store each shard
information inside the cluster.
ALLOCATING MEMORY
• Never allocate more than 31GB
heap to avoid the compressed
pointers issue.
• Use 1/2 of your memory up to
31GB.
• Feed your master and query
nodes, the more the better
(including CPU).
MEMORY LOCK
• Use memory_lock: true at
startup.
• Requires ulimit -l
unlimited.
• Allocates the whole heap at once.
• Uses contiguous memory regions.
• Avoids swapping (you should
disable swap anyway).
CHOSING THE RIGHT GARBAGE COLLECTOR
• ES runs with CMS as a default
garbage collector.
• CMS was designed for heaps <
4GB.
• Stop the world garbage
collection last too long & blocks
the cluster.
• Solution: switching to G1GC
(default in Java9, unsupported).
CMS VS G1GC
• CMS: SHARED CPU TIME WITH THE APPLICATION.
“STOPS THE WORLD” WHEN TOO MANY MEMORY TO
CLEAN UNTIL IT SENDS AN OUTOFMEMORYERROR.
• G1GC: SHORT, MORE FREQUENT, PAUSES. WON’T
STOP A NODE UNTIL IT LEAVES THE CLUSTER.
• ELASTIC SAYS: DON’T USE G1GC FOR REASONS,
SO READ THE DOC.
G1GC OPTIONS
+USEG1GC: ACTIVATES G1GC
MAXGCPAUSEMILLIS: TARGET FOR MAX GC PAUSE
TIME.
GCPAUSEINTERVALMILLIS:TARGET FOR COLLECTION
TIME SPACE
INITIATINGHEAPOCCUPANCYPERCENT: WHEN TO START
COLLECTING?
CHO0SING THE RIGHT STORAGE
• MMAPFS : MAPS LUCENE FILES ON
THE VIRTUAL MEMORY USING
MMAP. NEEDS AS MUCH MEMORY
AS THE FILE BEING MAPPED TO
AVOID ISSUES.
• NIOFS : APPLIES A SHARED LOCK
ON LUCENE FILES AND RELIES
ON VFS CACHE.
BUFFERS AND CACHES
• ELASTICSEARCH HAS MULTIPLE CACHES & BUFFERS, WITH DEFAULT VALUES,
KNOW THEM!
• BUFFERS + CACHE MUST BE < TOTAL JAVA HEAP (OBVIOUS BUT…).
• AUTOMATED EVICTION ON THE CACHE, BUT FORCING IT CAN SAVE YOUR LIFE
WITH A SMALL OVERHEAD.
• IF YOU HAVE OOM ISSUES, DISABLE THE CACHES!
• FROM A USER POV, CIRCUIT BREAKERS ARE A NO GO!
MANAGING LARGE INDEXES
INDEX DESIGN
• VERSION YOUR INDEX BY MAPPING: 1_*, 2_* ETC.
• THE MORE SHARDS, THE BETTER ELASTICITY, BUT
THE MORE CPU AND MEMORY USED ON THE
MASTERS.
• PROVISIONNING 10GB PER SHARDS ALLOWS A
FASTER RECOVERY & REALLOCATION.
REPLICATION TRICKS
• NUMBER OF REPLICAS MUST BE 0 OR ODD.
CONSISTENCY QUORUM: INT( (PRIMARY +
NUMBER_OF_REPLICAS) / 2 ) + 1.
• RAISE THE REPLICATION FACTOR TO SCALE FOR
READING
UP TO 100% OF THE DATA / DATA NODE.
ALIASES
• ACCESS MULTIPLE INDICES AT ONCE.
• READ MULTIPLE, WRITE ONLY ONE.
EXAMPLE ON TIMESTAMPED INDICES:
"18_20171020": { "aliases": { "2017": {}, "201710": {}, "20171020": {} } }
"18_20171021": { "aliases": { "2017": {}, "201710": {}, "20171021": {} } }
Queries:
/2017/_search
/201710/_search
AFTER A MAPPING CHANGE & REINDEX, CHANGE THE ALIAS:
ROLLOVER
• CREATE A NEW INDEX WHEN TOO OLD OR
TOO BIG.
• SUPPORT DATE MATH: DAILY INDEX
CREATION.
• USE ALIASES TO QUERY ALL
ROLLOVERED INDEXES.
PUT "logs-000001" { "aliases": { "logs": {} } }
POST /logs/_rollover { "conditions": { "max_docs": 10000000 } }
DAILY OPERATIONS
CONFIGURATION CHANGES
• PREFER CONFIGURATION FILE UPDATES TO API CALL FOR
PERMANENT CHANGES.
• VERSION YOUR CONFIGURATION CHANGES SO YOU CAN
ROLLBACK, ES REQUIRES LOTS OF FINE TUNING.
• WHEN USING _SETTINGS API, PREFER TRANSIENT TO
PERSISTENT, THEY’RE EASIER TO GET RID OF.
RECONFIGURING THE WHOLE CLUSTER
LOCK SHARD REALLOCATION & RECOVERY:
"cluster.routing.allocation.enable" : "none"
OPTIMIZE FOR RECOVERY:
"cluster.routing.allocation.node_initial_primaries_recoveries": 50
"indices.recovery.max_bytes_per_sec": "2048mb"
RESTART A FULL RACK, WAIT FOR NODES TO COME
THE REINDEX API
• IN CLUSTER AND CLUSTER TO CLUSTER REINDEX API.
• ALLOWS CROSS VERSION INDEXING: 1.7 TO 5.1…
• SLICED SCROLLS ONLY AVAILABLE STARTING 6.0.
• ACCEPT ES QUERIES TO FILTER THE DATA TO REINDEX.
• MERGE MULTIPLE INDEXES INTO 1.
BULK INDEXING TRICKS
LIMIT REBALANCE:
"cluster.routing.allocation.cluster_concurrent_rebalance": 1
"cluster.routing.allocation.balance.shard": "0.15f"
"cluster.routing.allocation.balance.threshold": "10.0f"
DISABLE REFRESH:
"index.refresh_interval:" "0"
NO REPLICA:
"index.number_of_replicas:" "0" // having replica index n times in Lucene, adding one just "rsync" the data.
ALLOCATE ON DEDICATED HARDWARE:
OPTIMIZING FOR SPACE & PERFORMANCES
• LUCENE SEGMENTS ARE IMMUTABLE, THE MORE YOU
WRITE, THE MORE SEGMENTS YOU GET.
• DELETING DOCUMENTS DOES COPY ON WRITE SO NO
REAL DELETE.
index.merge.scheduler.max_thread_count: default CPU/2 with min 4
POST /_force_merge?only_expunge_deletes: faster, only merge segments with deleted
POST /_force_merge?max_num_segments: don’t use on indexes you write on!
WARNING: _FORCE_MERGE HAS A COST IN CPU AND I/OS.
MINOR VERSION UPGRADES
• CHECK YOUR PLUGINS COMPATIBILITY, PLUGINS
MUST BE COMPILED FOR YOUR MINOR VERSION.
• START UPGRADING THE MASTER NODES.
• UPGRADE THE DATA NODES ON A WHOLE RACK AT
ONCE.
OS LEVEL UPGRADES
• ENSURE THE WHOLE CLUSTER RUNS THE SAME JAVA
VERSION.
• WHEN UPGRADING JAVA, CHECK IF YOU DON’T HAVE
TO UPGRADE THE KERNEL.
• PER NODE JAVA / KERNEL VERSION AVAILABLE IN THE
_STATS API.
MONITORING
CAPTAIN OBVIOUS, YOU’RE MY ONLY HOPE!
• GOOD MONITORING IS BUSINESS ORIENTED MONITORING.
• GOOD ALERTING IS ACTIONABLE ALERTING.
• DON’T MONITORE THE CLUSTER ONLY, BUT THE WHOLE PROCESSING
CHAIN.
• USELESS METRICS ARE USELESS.
• LOSING A DATACENTER: OK. LOSING DATA: NOT OK!
MONITORING TOOLING
• ELASTICSEARCH X-
PACK,
• GRAFANA…
LIFE, DEATH & _CLUSTER/HEALTH
• A RED CLUSTER MEANS AT LEAST 1
INDEX HAS MISSING DATA. DON’T
PANIC!
• USING LEVEL={INDEX,SHARD} AND AN
INDEX ID PROVIDES SPECIFIC
INFORMATION.
• LOTS OF PENDING TASKS MEANS YOUR
CLUSTER IS UNDER HEAVY LOAD AND
SOME NODES CAN’T PROCESS THEM
FAST ENOUGH.
• LONG WAITING TASKS MEANS YOU
HAVE A CAPACITY PLANNING PROBLEM.
USE THE _CAT API
• PROVIDES GENERAL INFORMATION
ABOUT YOUR NODES, SHARDS,
INDICES AND THREAD POOLS.
• HIT THE WHOLE CLUSTER, WHEN
IT TIMEOUTS YOU’VE PROBABLY
HAVING A NODE STUCK IN
GARBAGE COLLECTION.
MONITORING AT THE CLUSTER LEVEL
• USE THE _STATS API FOR PRECISE INFORMATION.
• MONITORE THE SHARDS REALLOCATION, TOO MANY
MEANS A DESIGN PROBLEM.
• MONITORE THE WRITES AND CLUSTER WIDE, IF THEY
FALL TO 0 AND IT’S UNUSUAL, A NODE IS STUCK IN
GC.
MONITORING AT THE NODE LEVEL
• USE THE _NODES/{NODE}, _NODES/{NODE}/STATS
AND _CAT/THREAD_POOL API.
• THE GARBAGE COLLECTION DURATION &
FREQUENCY IS A GOOD METRIC OF YOUR NODE
HEALTH.
• CACHE AND BUFFERS ARE MONITORED ON A NODE
LEVEL.
• MONITORING I/OS, SPACE, OPEN FILES & CPU IS
CRITICAL.
MONITORING AT THE INDEX LEVEL
• USE THE {INDEX}/_STATS API.
• MONITORE THE DOCUMENTS / SHARD RATIO.
• MONITORE THE MERGES, QUERY TIME.
• TOO MANY EVICTIONS MEANS YOU HAVE A CACHE
CONFIGURATION LEVEL.
TROUBLESHOOTING
WHAT’S REALLY GOING ON IN YOUR CLUSTER?
• THE _NODES/{NODE}/HOT_THREADS API TELLS WHAT HAPPENS ON
THE HOST.
• THE ELECTED MASTER NODES TELLS YOU MOST THING YOU NEED
TO KNOW.
• ENABLE THE SLOW LOGS TO UNDERSTAND YOUR BOTTLENECK &
OPTIMIZE THE QUERIES. DISABLE THE SLOW LOGS WHEN YOU’RE
DONE!!!
• WHEN NOT ENOUGH, MEMORY PROFILING IS YOUR FRIEND.
MEMORY PROFILING
• LIVE MEMORY OR HPROF FILE AFTER A CRASH.
• ALLOWS YOU TO TO KNOW WHAT IS / WAS IN YOUR
BUFFERS AND CACHES.
• YOURKIT JAVA PROFILER AS A TOOL.
TRACING
• KNOW WHAT’S REALLY HAPPENING IN YOUR JVM.
• LINUX 4.X PROVIDES GREAT PERF TOOLS, LINUX 4.9 EVEN
BETTER:
• LINUX-PERF,
• JAVA PERF MAP.
• VECTOR BY NETFLIX (NOT VEKTOR THE TRASH METAL
BAND).
QUESTIONS
?
@FDEVILLAMI
L
@SYNTHESIO
SLIDES: HTTP://BIT.DO/ELASTICSEARCH-SYSADMIN-201

More Related Content

What's hot

Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
Roopendra Vishwakarma
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Elastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and CloudElastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and Cloud
Joe Ryan
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
Maruf Hassan
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
hypto
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
SmartCat
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Shagun Rathore
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Ruslan Zavacky
 
The Elastic ELK Stack
The Elastic ELK StackThe Elastic ELK Stack
The Elastic ELK Stack
enterprisesearchmeetup
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Databricks
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
Michael Keane
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
DataScienceConferenc1
 

What's hot (20)

Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Elastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and CloudElastic Stack ELK, Beats, and Cloud
Elastic Stack ELK, Beats, and Cloud
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
The Elastic ELK Stack
The Elastic ELK StackThe Elastic ELK Stack
The Elastic ELK Stack
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
 

Similar to Running & Scaling Large Elasticsearch Clusters

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
Erick Ramirez
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4Gaurav "GP" Pal
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
Gaurav "GP" Pal
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
Scaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioScaling Elasticsearch at Synthesio
Scaling Elasticsearch at Synthesio
Fred de Villamil
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
Glenn K. Lockwood
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
Shyam Raj
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
larsgeorge
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
DoiT International
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
Instaclustr
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
Jairam Chandar
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
Ceph Community
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
Great Wide Open
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
Fred de Villamil
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
Ivan Zoratti
 

Similar to Running & Scaling Large Elasticsearch Clusters (20)

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Scaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioScaling Elasticsearch at Synthesio
Scaling Elasticsearch at Synthesio
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
 

More from Fred de Villamil

Scaling your Engineering Team
Scaling your Engineering TeamScaling your Engineering Team
Scaling your Engineering Team
Fred de Villamil
 
Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3
Fred de Villamil
 
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without DowntimeMigrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Fred de Villamil
 
Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04
Fred de Villamil
 
The Commando Devops
The Commando DevopsThe Commando Devops
The Commando Devops
Fred de Villamil
 
Zendcon Performance Oci8
Zendcon Performance Oci8Zendcon Performance Oci8
Zendcon Performance Oci8
Fred de Villamil
 
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Fred de Villamil
 
Presentation Rails
Presentation RailsPresentation Rails
Presentation Rails
Fred de Villamil
 

More from Fred de Villamil (9)

Scaling your Engineering Team
Scaling your Engineering TeamScaling your Engineering Team
Scaling your Engineering Team
 
Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3
 
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without DowntimeMigrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
 
Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04
 
The Commando Devops
The Commando DevopsThe Commando Devops
The Commando Devops
 
How People Use Iphone
How People Use IphoneHow People Use Iphone
How People Use Iphone
 
Zendcon Performance Oci8
Zendcon Performance Oci8Zendcon Performance Oci8
Zendcon Performance Oci8
 
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
 
Presentation Rails
Presentation RailsPresentation Rails
Presentation Rails
 

Recently uploaded

Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 

Recently uploaded (20)

Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 

Running & Scaling Large Elasticsearch Clusters

  • 1. RUNNING & SCALING LARGE ELASTICSEARCH CLUSTERS FRED DE VILLAMIL, DIRECTOR OF INFRASTRUCTURE @FDEVILLAMIL OCTOBER 2017
  • 2. BACKGROUND • FRED DE VILLAMIL, 39 ANS, TEAM COFFEE @SYNTHESIO, • LINUX / (FREE)BSD USER SINCE 1996, • OPEN SOURCE CONTRIBUTOR SINCE 1998, • LOVES TENNIS, PHOTOGRAPHY, CUTE OTTERS, INAPPROPRIATE HUMOR AND ELASTICSEARCH CLUSTERS OF UNUSUAL SIZE. WRITES ABOUT ES AT HTTPS://THOUGHTS.T37.NET
  • 3. ELASTICSEARCH @SYNTHESIO • 8 production clusters, 600 hosts, 1.7PB storage, 37.5TB RAM, average 15k writes / s, 800 search /s, some inputs > 200MB. • Data nodes: 6 core Xeon E5v3, 64GB RAM, 4*800GB SSD RAID0. Sometimes bi Xeon E5-2687Wv4 12 core (160 watts!!!). • We agregate data from various cold storage and make them searchable in a giffy.
  • 4. AN ELASTICSEARCH CLUSTER OF UNUSUAL SIZE
  • 6. NEVER GONNA GIVE YOU UP • NEVER GONNA LET YOU DOWN, • NEVER GONNA RUN AROUND AND DESERT YOU, • NEVER GONNA MAKE YOU CRY, • NEVER GONNA SAY GOODBYE, • NEVER GONNA TELL A LIE & HURT YOU.
  • 7. AVOIDING DOWNTIME & SPLIT BRAINS • RUN AT LEAST 3 MASTER NODES INTO 3 DIFFERENT LOCATIONS. • NEVER RUN BULK QUERIES ON THE MASTER NODES. • ACTUALLY NEVER RUN ANYTHING BUT ADMINISTRATIVE TASKS ON THE MASTER NODES. • SPREAD YOUR DATA INTO 2 DIFFERENT LOCATION WITH AT LEAST A REPLICATION FACTOR OF 1 (1 PRIMARY, 1 REPLICA).
  • 8. RACK AWARENESS ALLOCATE A RACK_ID TO THE DATA NODES FOR EVEN REPLICATION. RESTART A WHOLE DATA CENTER @ONCE WITHOUT DOWNTIME.
  • 9. RACK AWARENESS + QUERY NODES == MAGIC ES PRIVILEGES THE DATA NODES WITH THE SAME RACK_ID AS THE QUERY. REDUCES LATENCY AND BALANCES THE LOAD.
  • 10. RACK AWARENESS + QUERY NODES + ZONE == MAGIC + FUN ADD ZONES INTO THE SAME RACK FOR EVEN REPLICATION WITH HIGHER FACTOR.
  • 11. USING ZONES FOR FUN & PROFIT ALLOWING EVEN REPLICATION WITH A HIGHER FACTOR WITHIN THE SAME RACK. ALLOWING MORE RESOURCES TO THE MOST FREQUENTLY ACCESSED INDEXES. …
  • 13. HOW ELASTICSEARCH USES THE MEMORY • Starts with allocating memory for Java heap. • The Java heap contains all Elasticsearch buffers and caches + a few other things. • Each Java thread maps a system thread: +128kB off heap. • Elected master uses 250kB to store each shard information inside the cluster.
  • 14. ALLOCATING MEMORY • Never allocate more than 31GB heap to avoid the compressed pointers issue. • Use 1/2 of your memory up to 31GB. • Feed your master and query nodes, the more the better (including CPU).
  • 15. MEMORY LOCK • Use memory_lock: true at startup. • Requires ulimit -l unlimited. • Allocates the whole heap at once. • Uses contiguous memory regions. • Avoids swapping (you should disable swap anyway).
  • 16. CHOSING THE RIGHT GARBAGE COLLECTOR • ES runs with CMS as a default garbage collector. • CMS was designed for heaps < 4GB. • Stop the world garbage collection last too long & blocks the cluster. • Solution: switching to G1GC (default in Java9, unsupported).
  • 17. CMS VS G1GC • CMS: SHARED CPU TIME WITH THE APPLICATION. “STOPS THE WORLD” WHEN TOO MANY MEMORY TO CLEAN UNTIL IT SENDS AN OUTOFMEMORYERROR. • G1GC: SHORT, MORE FREQUENT, PAUSES. WON’T STOP A NODE UNTIL IT LEAVES THE CLUSTER. • ELASTIC SAYS: DON’T USE G1GC FOR REASONS, SO READ THE DOC.
  • 18. G1GC OPTIONS +USEG1GC: ACTIVATES G1GC MAXGCPAUSEMILLIS: TARGET FOR MAX GC PAUSE TIME. GCPAUSEINTERVALMILLIS:TARGET FOR COLLECTION TIME SPACE INITIATINGHEAPOCCUPANCYPERCENT: WHEN TO START COLLECTING?
  • 19. CHO0SING THE RIGHT STORAGE • MMAPFS : MAPS LUCENE FILES ON THE VIRTUAL MEMORY USING MMAP. NEEDS AS MUCH MEMORY AS THE FILE BEING MAPPED TO AVOID ISSUES. • NIOFS : APPLIES A SHARED LOCK ON LUCENE FILES AND RELIES ON VFS CACHE.
  • 20. BUFFERS AND CACHES • ELASTICSEARCH HAS MULTIPLE CACHES & BUFFERS, WITH DEFAULT VALUES, KNOW THEM! • BUFFERS + CACHE MUST BE < TOTAL JAVA HEAP (OBVIOUS BUT…). • AUTOMATED EVICTION ON THE CACHE, BUT FORCING IT CAN SAVE YOUR LIFE WITH A SMALL OVERHEAD. • IF YOU HAVE OOM ISSUES, DISABLE THE CACHES! • FROM A USER POV, CIRCUIT BREAKERS ARE A NO GO!
  • 22. INDEX DESIGN • VERSION YOUR INDEX BY MAPPING: 1_*, 2_* ETC. • THE MORE SHARDS, THE BETTER ELASTICITY, BUT THE MORE CPU AND MEMORY USED ON THE MASTERS. • PROVISIONNING 10GB PER SHARDS ALLOWS A FASTER RECOVERY & REALLOCATION.
  • 23. REPLICATION TRICKS • NUMBER OF REPLICAS MUST BE 0 OR ODD. CONSISTENCY QUORUM: INT( (PRIMARY + NUMBER_OF_REPLICAS) / 2 ) + 1. • RAISE THE REPLICATION FACTOR TO SCALE FOR READING UP TO 100% OF THE DATA / DATA NODE.
  • 24. ALIASES • ACCESS MULTIPLE INDICES AT ONCE. • READ MULTIPLE, WRITE ONLY ONE. EXAMPLE ON TIMESTAMPED INDICES: "18_20171020": { "aliases": { "2017": {}, "201710": {}, "20171020": {} } } "18_20171021": { "aliases": { "2017": {}, "201710": {}, "20171021": {} } } Queries: /2017/_search /201710/_search AFTER A MAPPING CHANGE & REINDEX, CHANGE THE ALIAS:
  • 25. ROLLOVER • CREATE A NEW INDEX WHEN TOO OLD OR TOO BIG. • SUPPORT DATE MATH: DAILY INDEX CREATION. • USE ALIASES TO QUERY ALL ROLLOVERED INDEXES. PUT "logs-000001" { "aliases": { "logs": {} } } POST /logs/_rollover { "conditions": { "max_docs": 10000000 } }
  • 27. CONFIGURATION CHANGES • PREFER CONFIGURATION FILE UPDATES TO API CALL FOR PERMANENT CHANGES. • VERSION YOUR CONFIGURATION CHANGES SO YOU CAN ROLLBACK, ES REQUIRES LOTS OF FINE TUNING. • WHEN USING _SETTINGS API, PREFER TRANSIENT TO PERSISTENT, THEY’RE EASIER TO GET RID OF.
  • 28. RECONFIGURING THE WHOLE CLUSTER LOCK SHARD REALLOCATION & RECOVERY: "cluster.routing.allocation.enable" : "none" OPTIMIZE FOR RECOVERY: "cluster.routing.allocation.node_initial_primaries_recoveries": 50 "indices.recovery.max_bytes_per_sec": "2048mb" RESTART A FULL RACK, WAIT FOR NODES TO COME
  • 29. THE REINDEX API • IN CLUSTER AND CLUSTER TO CLUSTER REINDEX API. • ALLOWS CROSS VERSION INDEXING: 1.7 TO 5.1… • SLICED SCROLLS ONLY AVAILABLE STARTING 6.0. • ACCEPT ES QUERIES TO FILTER THE DATA TO REINDEX. • MERGE MULTIPLE INDEXES INTO 1.
  • 30. BULK INDEXING TRICKS LIMIT REBALANCE: "cluster.routing.allocation.cluster_concurrent_rebalance": 1 "cluster.routing.allocation.balance.shard": "0.15f" "cluster.routing.allocation.balance.threshold": "10.0f" DISABLE REFRESH: "index.refresh_interval:" "0" NO REPLICA: "index.number_of_replicas:" "0" // having replica index n times in Lucene, adding one just "rsync" the data. ALLOCATE ON DEDICATED HARDWARE:
  • 31. OPTIMIZING FOR SPACE & PERFORMANCES • LUCENE SEGMENTS ARE IMMUTABLE, THE MORE YOU WRITE, THE MORE SEGMENTS YOU GET. • DELETING DOCUMENTS DOES COPY ON WRITE SO NO REAL DELETE. index.merge.scheduler.max_thread_count: default CPU/2 with min 4 POST /_force_merge?only_expunge_deletes: faster, only merge segments with deleted POST /_force_merge?max_num_segments: don’t use on indexes you write on! WARNING: _FORCE_MERGE HAS A COST IN CPU AND I/OS.
  • 32. MINOR VERSION UPGRADES • CHECK YOUR PLUGINS COMPATIBILITY, PLUGINS MUST BE COMPILED FOR YOUR MINOR VERSION. • START UPGRADING THE MASTER NODES. • UPGRADE THE DATA NODES ON A WHOLE RACK AT ONCE.
  • 33. OS LEVEL UPGRADES • ENSURE THE WHOLE CLUSTER RUNS THE SAME JAVA VERSION. • WHEN UPGRADING JAVA, CHECK IF YOU DON’T HAVE TO UPGRADE THE KERNEL. • PER NODE JAVA / KERNEL VERSION AVAILABLE IN THE _STATS API.
  • 35. CAPTAIN OBVIOUS, YOU’RE MY ONLY HOPE! • GOOD MONITORING IS BUSINESS ORIENTED MONITORING. • GOOD ALERTING IS ACTIONABLE ALERTING. • DON’T MONITORE THE CLUSTER ONLY, BUT THE WHOLE PROCESSING CHAIN. • USELESS METRICS ARE USELESS. • LOSING A DATACENTER: OK. LOSING DATA: NOT OK!
  • 36. MONITORING TOOLING • ELASTICSEARCH X- PACK, • GRAFANA…
  • 37. LIFE, DEATH & _CLUSTER/HEALTH • A RED CLUSTER MEANS AT LEAST 1 INDEX HAS MISSING DATA. DON’T PANIC! • USING LEVEL={INDEX,SHARD} AND AN INDEX ID PROVIDES SPECIFIC INFORMATION. • LOTS OF PENDING TASKS MEANS YOUR CLUSTER IS UNDER HEAVY LOAD AND SOME NODES CAN’T PROCESS THEM FAST ENOUGH. • LONG WAITING TASKS MEANS YOU HAVE A CAPACITY PLANNING PROBLEM.
  • 38. USE THE _CAT API • PROVIDES GENERAL INFORMATION ABOUT YOUR NODES, SHARDS, INDICES AND THREAD POOLS. • HIT THE WHOLE CLUSTER, WHEN IT TIMEOUTS YOU’VE PROBABLY HAVING A NODE STUCK IN GARBAGE COLLECTION.
  • 39. MONITORING AT THE CLUSTER LEVEL • USE THE _STATS API FOR PRECISE INFORMATION. • MONITORE THE SHARDS REALLOCATION, TOO MANY MEANS A DESIGN PROBLEM. • MONITORE THE WRITES AND CLUSTER WIDE, IF THEY FALL TO 0 AND IT’S UNUSUAL, A NODE IS STUCK IN GC.
  • 40. MONITORING AT THE NODE LEVEL • USE THE _NODES/{NODE}, _NODES/{NODE}/STATS AND _CAT/THREAD_POOL API. • THE GARBAGE COLLECTION DURATION & FREQUENCY IS A GOOD METRIC OF YOUR NODE HEALTH. • CACHE AND BUFFERS ARE MONITORED ON A NODE LEVEL. • MONITORING I/OS, SPACE, OPEN FILES & CPU IS CRITICAL.
  • 41. MONITORING AT THE INDEX LEVEL • USE THE {INDEX}/_STATS API. • MONITORE THE DOCUMENTS / SHARD RATIO. • MONITORE THE MERGES, QUERY TIME. • TOO MANY EVICTIONS MEANS YOU HAVE A CACHE CONFIGURATION LEVEL.
  • 43. WHAT’S REALLY GOING ON IN YOUR CLUSTER? • THE _NODES/{NODE}/HOT_THREADS API TELLS WHAT HAPPENS ON THE HOST. • THE ELECTED MASTER NODES TELLS YOU MOST THING YOU NEED TO KNOW. • ENABLE THE SLOW LOGS TO UNDERSTAND YOUR BOTTLENECK & OPTIMIZE THE QUERIES. DISABLE THE SLOW LOGS WHEN YOU’RE DONE!!! • WHEN NOT ENOUGH, MEMORY PROFILING IS YOUR FRIEND.
  • 44. MEMORY PROFILING • LIVE MEMORY OR HPROF FILE AFTER A CRASH. • ALLOWS YOU TO TO KNOW WHAT IS / WAS IN YOUR BUFFERS AND CACHES. • YOURKIT JAVA PROFILER AS A TOOL.
  • 45. TRACING • KNOW WHAT’S REALLY HAPPENING IN YOUR JVM. • LINUX 4.X PROVIDES GREAT PERF TOOLS, LINUX 4.9 EVEN BETTER: • LINUX-PERF, • JAVA PERF MAP. • VECTOR BY NETFLIX (NOT VEKTOR THE TRASH METAL BAND).