SlideShare a Scribd company logo
1 of 46
RUNNING & SCALING
LARGE ELASTICSEARCH
CLUSTERS
FRED DE VILLAMIL, DIRECTOR OF
INFRASTRUCTURE
@FDEVILLAMIL
OCTOBER 2017
BACKGROUND
• FRED DE VILLAMIL, 39 ANS, TEAM
COFFEE @SYNTHESIO,
• LINUX / (FREE)BSD USER SINCE 1996,
• OPEN SOURCE CONTRIBUTOR SINCE
1998,
• LOVES TENNIS, PHOTOGRAPHY, CUTE
OTTERS, INAPPROPRIATE HUMOR AND
ELASTICSEARCH CLUSTERS OF UNUSUAL
SIZE.
WRITES ABOUT ES AT
HTTPS://THOUGHTS.T37.NET
ELASTICSEARCH @SYNTHESIO
• 8 production clusters, 600 hosts, 1.7PB storage, 37.5TB
RAM, average 15k writes / s, 800 search /s, some inputs >
200MB.
• Data nodes: 6 core Xeon E5v3, 64GB RAM, 4*800GB SSD
RAID0. Sometimes bi Xeon E5-2687Wv4 12 core (160
watts!!!).
• We agregate data from various cold storage and make them
searchable in a giffy.
AN ELASTICSEARCH CLUSTER OF UNUSUAL
SIZE
ENSURING HIGH
AVAILABILITY
NEVER GONNA GIVE YOU UP
• NEVER GONNA LET YOU DOWN,
• NEVER GONNA RUN AROUND AND
DESERT YOU,
• NEVER GONNA MAKE YOU CRY,
• NEVER GONNA SAY GOODBYE,
• NEVER GONNA TELL A LIE & HURT YOU.
AVOIDING DOWNTIME & SPLIT BRAINS
• RUN AT LEAST 3 MASTER NODES INTO 3 DIFFERENT
LOCATIONS.
• NEVER RUN BULK QUERIES ON THE MASTER
NODES.
• ACTUALLY NEVER RUN ANYTHING BUT
ADMINISTRATIVE TASKS ON THE MASTER NODES.
• SPREAD YOUR DATA INTO 2 DIFFERENT LOCATION
WITH AT LEAST A REPLICATION FACTOR OF 1 (1
PRIMARY, 1 REPLICA).
RACK AWARENESS
ALLOCATE A
RACK_ID TO THE
DATA NODES FOR
EVEN REPLICATION.
RESTART A WHOLE
DATA CENTER
@ONCE WITHOUT
DOWNTIME.
RACK AWARENESS + QUERY NODES ==
MAGIC
ES PRIVILEGES THE
DATA NODES WITH
THE SAME RACK_ID
AS THE QUERY.
REDUCES LATENCY
AND BALANCES THE
LOAD.
RACK AWARENESS + QUERY NODES + ZONE ==
MAGIC + FUN
ADD ZONES INTO THE
SAME RACK FOR
EVEN REPLICATION
WITH HIGHER
FACTOR.
USING ZONES FOR FUN & PROFIT
ALLOWING EVEN REPLICATION
WITH A HIGHER FACTOR WITHIN
THE SAME RACK.
ALLOWING MORE RESOURCES
TO THE MOST FREQUENTLY
ACCESSED INDEXES.
…
AVOIDING MEMORY
NIGHTMARE
HOW ELASTICSEARCH USES THE MEMORY
• Starts with allocating memory for Java heap.
• The Java heap contains all Elasticsearch buffers
and caches + a few other things.
• Each Java thread maps a system thread: +128kB
off heap.
• Elected master uses 250kB to store each shard
information inside the cluster.
ALLOCATING MEMORY
• Never allocate more than 31GB
heap to avoid the compressed
pointers issue.
• Use 1/2 of your memory up to
31GB.
• Feed your master and query
nodes, the more the better
(including CPU).
MEMORY LOCK
• Use memory_lock: true at
startup.
• Requires ulimit -l
unlimited.
• Allocates the whole heap at once.
• Uses contiguous memory regions.
• Avoids swapping (you should
disable swap anyway).
CHOSING THE RIGHT GARBAGE COLLECTOR
• ES runs with CMS as a default
garbage collector.
• CMS was designed for heaps <
4GB.
• Stop the world garbage
collection last too long & blocks
the cluster.
• Solution: switching to G1GC
(default in Java9, unsupported).
CMS VS G1GC
• CMS: SHARED CPU TIME WITH THE APPLICATION.
“STOPS THE WORLD” WHEN TOO MANY MEMORY TO
CLEAN UNTIL IT SENDS AN OUTOFMEMORYERROR.
• G1GC: SHORT, MORE FREQUENT, PAUSES. WON’T
STOP A NODE UNTIL IT LEAVES THE CLUSTER.
• ELASTIC SAYS: DON’T USE G1GC FOR REASONS,
SO READ THE DOC.
G1GC OPTIONS
+USEG1GC: ACTIVATES G1GC
MAXGCPAUSEMILLIS: TARGET FOR MAX GC PAUSE
TIME.
GCPAUSEINTERVALMILLIS:TARGET FOR COLLECTION
TIME SPACE
INITIATINGHEAPOCCUPANCYPERCENT: WHEN TO START
COLLECTING?
CHO0SING THE RIGHT STORAGE
• MMAPFS : MAPS LUCENE FILES ON
THE VIRTUAL MEMORY USING
MMAP. NEEDS AS MUCH MEMORY
AS THE FILE BEING MAPPED TO
AVOID ISSUES.
• NIOFS : APPLIES A SHARED LOCK
ON LUCENE FILES AND RELIES
ON VFS CACHE.
BUFFERS AND CACHES
• ELASTICSEARCH HAS MULTIPLE CACHES & BUFFERS, WITH DEFAULT VALUES,
KNOW THEM!
• BUFFERS + CACHE MUST BE < TOTAL JAVA HEAP (OBVIOUS BUT…).
• AUTOMATED EVICTION ON THE CACHE, BUT FORCING IT CAN SAVE YOUR LIFE
WITH A SMALL OVERHEAD.
• IF YOU HAVE OOM ISSUES, DISABLE THE CACHES!
• FROM A USER POV, CIRCUIT BREAKERS ARE A NO GO!
MANAGING LARGE INDEXES
INDEX DESIGN
• VERSION YOUR INDEX BY MAPPING: 1_*, 2_* ETC.
• THE MORE SHARDS, THE BETTER ELASTICITY, BUT
THE MORE CPU AND MEMORY USED ON THE
MASTERS.
• PROVISIONNING 10GB PER SHARDS ALLOWS A
FASTER RECOVERY & REALLOCATION.
REPLICATION TRICKS
• NUMBER OF REPLICAS MUST BE 0 OR ODD.
CONSISTENCY QUORUM: INT( (PRIMARY +
NUMBER_OF_REPLICAS) / 2 ) + 1.
• RAISE THE REPLICATION FACTOR TO SCALE FOR
READING
UP TO 100% OF THE DATA / DATA NODE.
ALIASES
• ACCESS MULTIPLE INDICES AT ONCE.
• READ MULTIPLE, WRITE ONLY ONE.
EXAMPLE ON TIMESTAMPED INDICES:
"18_20171020": { "aliases": { "2017": {}, "201710": {}, "20171020": {} } }
"18_20171021": { "aliases": { "2017": {}, "201710": {}, "20171021": {} } }
Queries:
/2017/_search
/201710/_search
AFTER A MAPPING CHANGE & REINDEX, CHANGE THE ALIAS:
ROLLOVER
• CREATE A NEW INDEX WHEN TOO OLD OR
TOO BIG.
• SUPPORT DATE MATH: DAILY INDEX
CREATION.
• USE ALIASES TO QUERY ALL
ROLLOVERED INDEXES.
PUT "logs-000001" { "aliases": { "logs": {} } }
POST /logs/_rollover { "conditions": { "max_docs": 10000000 } }
DAILY OPERATIONS
CONFIGURATION CHANGES
• PREFER CONFIGURATION FILE UPDATES TO API CALL FOR
PERMANENT CHANGES.
• VERSION YOUR CONFIGURATION CHANGES SO YOU CAN
ROLLBACK, ES REQUIRES LOTS OF FINE TUNING.
• WHEN USING _SETTINGS API, PREFER TRANSIENT TO
PERSISTENT, THEY’RE EASIER TO GET RID OF.
RECONFIGURING THE WHOLE CLUSTER
LOCK SHARD REALLOCATION & RECOVERY:
"cluster.routing.allocation.enable" : "none"
OPTIMIZE FOR RECOVERY:
"cluster.routing.allocation.node_initial_primaries_recoveries": 50
"indices.recovery.max_bytes_per_sec": "2048mb"
RESTART A FULL RACK, WAIT FOR NODES TO COME
THE REINDEX API
• IN CLUSTER AND CLUSTER TO CLUSTER REINDEX API.
• ALLOWS CROSS VERSION INDEXING: 1.7 TO 5.1…
• SLICED SCROLLS ONLY AVAILABLE STARTING 6.0.
• ACCEPT ES QUERIES TO FILTER THE DATA TO REINDEX.
• MERGE MULTIPLE INDEXES INTO 1.
BULK INDEXING TRICKS
LIMIT REBALANCE:
"cluster.routing.allocation.cluster_concurrent_rebalance": 1
"cluster.routing.allocation.balance.shard": "0.15f"
"cluster.routing.allocation.balance.threshold": "10.0f"
DISABLE REFRESH:
"index.refresh_interval:" "0"
NO REPLICA:
"index.number_of_replicas:" "0" // having replica index n times in Lucene, adding one just "rsync" the data.
ALLOCATE ON DEDICATED HARDWARE:
OPTIMIZING FOR SPACE & PERFORMANCES
• LUCENE SEGMENTS ARE IMMUTABLE, THE MORE YOU
WRITE, THE MORE SEGMENTS YOU GET.
• DELETING DOCUMENTS DOES COPY ON WRITE SO NO
REAL DELETE.
index.merge.scheduler.max_thread_count: default CPU/2 with min 4
POST /_force_merge?only_expunge_deletes: faster, only merge segments with deleted
POST /_force_merge?max_num_segments: don’t use on indexes you write on!
WARNING: _FORCE_MERGE HAS A COST IN CPU AND I/OS.
MINOR VERSION UPGRADES
• CHECK YOUR PLUGINS COMPATIBILITY, PLUGINS
MUST BE COMPILED FOR YOUR MINOR VERSION.
• START UPGRADING THE MASTER NODES.
• UPGRADE THE DATA NODES ON A WHOLE RACK AT
ONCE.
OS LEVEL UPGRADES
• ENSURE THE WHOLE CLUSTER RUNS THE SAME JAVA
VERSION.
• WHEN UPGRADING JAVA, CHECK IF YOU DON’T HAVE
TO UPGRADE THE KERNEL.
• PER NODE JAVA / KERNEL VERSION AVAILABLE IN THE
_STATS API.
MONITORING
CAPTAIN OBVIOUS, YOU’RE MY ONLY HOPE!
• GOOD MONITORING IS BUSINESS ORIENTED MONITORING.
• GOOD ALERTING IS ACTIONABLE ALERTING.
• DON’T MONITORE THE CLUSTER ONLY, BUT THE WHOLE PROCESSING
CHAIN.
• USELESS METRICS ARE USELESS.
• LOSING A DATACENTER: OK. LOSING DATA: NOT OK!
MONITORING TOOLING
• ELASTICSEARCH X-
PACK,
• GRAFANA…
LIFE, DEATH & _CLUSTER/HEALTH
• A RED CLUSTER MEANS AT LEAST 1
INDEX HAS MISSING DATA. DON’T
PANIC!
• USING LEVEL={INDEX,SHARD} AND AN
INDEX ID PROVIDES SPECIFIC
INFORMATION.
• LOTS OF PENDING TASKS MEANS YOUR
CLUSTER IS UNDER HEAVY LOAD AND
SOME NODES CAN’T PROCESS THEM
FAST ENOUGH.
• LONG WAITING TASKS MEANS YOU
HAVE A CAPACITY PLANNING PROBLEM.
USE THE _CAT API
• PROVIDES GENERAL INFORMATION
ABOUT YOUR NODES, SHARDS,
INDICES AND THREAD POOLS.
• HIT THE WHOLE CLUSTER, WHEN
IT TIMEOUTS YOU’VE PROBABLY
HAVING A NODE STUCK IN
GARBAGE COLLECTION.
MONITORING AT THE CLUSTER LEVEL
• USE THE _STATS API FOR PRECISE INFORMATION.
• MONITORE THE SHARDS REALLOCATION, TOO MANY
MEANS A DESIGN PROBLEM.
• MONITORE THE WRITES AND CLUSTER WIDE, IF THEY
FALL TO 0 AND IT’S UNUSUAL, A NODE IS STUCK IN
GC.
MONITORING AT THE NODE LEVEL
• USE THE _NODES/{NODE}, _NODES/{NODE}/STATS
AND _CAT/THREAD_POOL API.
• THE GARBAGE COLLECTION DURATION &
FREQUENCY IS A GOOD METRIC OF YOUR NODE
HEALTH.
• CACHE AND BUFFERS ARE MONITORED ON A NODE
LEVEL.
• MONITORING I/OS, SPACE, OPEN FILES & CPU IS
CRITICAL.
MONITORING AT THE INDEX LEVEL
• USE THE {INDEX}/_STATS API.
• MONITORE THE DOCUMENTS / SHARD RATIO.
• MONITORE THE MERGES, QUERY TIME.
• TOO MANY EVICTIONS MEANS YOU HAVE A CACHE
CONFIGURATION LEVEL.
TROUBLESHOOTING
WHAT’S REALLY GOING ON IN YOUR CLUSTER?
• THE _NODES/{NODE}/HOT_THREADS API TELLS WHAT HAPPENS ON
THE HOST.
• THE ELECTED MASTER NODES TELLS YOU MOST THING YOU NEED
TO KNOW.
• ENABLE THE SLOW LOGS TO UNDERSTAND YOUR BOTTLENECK &
OPTIMIZE THE QUERIES. DISABLE THE SLOW LOGS WHEN YOU’RE
DONE!!!
• WHEN NOT ENOUGH, MEMORY PROFILING IS YOUR FRIEND.
MEMORY PROFILING
• LIVE MEMORY OR HPROF FILE AFTER A CRASH.
• ALLOWS YOU TO TO KNOW WHAT IS / WAS IN YOUR
BUFFERS AND CACHES.
• YOURKIT JAVA PROFILER AS A TOOL.
TRACING
• KNOW WHAT’S REALLY HAPPENING IN YOUR JVM.
• LINUX 4.X PROVIDES GREAT PERF TOOLS, LINUX 4.9 EVEN
BETTER:
• LINUX-PERF,
• JAVA PERF MAP.
• VECTOR BY NETFLIX (NOT VEKTOR THE TRASH METAL
BAND).
QUESTIONS
?
@FDEVILLAMI
L
@SYNTHESIO
SLIDES: HTTP://BIT.DO/ELASTICSEARCH-SYSADMIN-201

More Related Content

What's hot

Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSematext Group, Inc.
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internalsCloudera, Inc.
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedis Labs
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
 
PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language Weaveworks
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on ExadataAnil Nair
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDBAmazon Web Services
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep DiveAmazon Web Services
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 

What's hot (20)

Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on Exadata
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 

Similar to Running & Scaling Large Elasticsearch Clusters

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successErick Ramirez
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4Gaurav "GP" Pal
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefGaurav "GP" Pal
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
Scaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioScaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioFred de Villamil
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Marco Tusa
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?DoiT International
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016DataStax
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017Ivan Zoratti
 

Similar to Running & Scaling Large Elasticsearch Clusters (20)

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Scaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioScaling Elasticsearch at Synthesio
Scaling Elasticsearch at Synthesio
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
 

More from Fred de Villamil

Scaling your Engineering Team
Scaling your Engineering TeamScaling your Engineering Team
Scaling your Engineering TeamFred de Villamil
 
Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3Fred de Villamil
 
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without DowntimeMigrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without DowntimeFred de Villamil
 
Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04Fred de Villamil
 
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...Fred de Villamil
 

More from Fred de Villamil (9)

Scaling your Engineering Team
Scaling your Engineering TeamScaling your Engineering Team
Scaling your Engineering Team
 
Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3Hiring and Managing Happy Engineers - CTO Pizza #3
Hiring and Managing Happy Engineers - CTO Pizza #3
 
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without DowntimeMigrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
 
Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04Devops commando - Paris Devops 2016-04
Devops commando - Paris Devops 2016-04
 
The Commando Devops
The Commando DevopsThe Commando Devops
The Commando Devops
 
How People Use Iphone
How People Use IphoneHow People Use Iphone
How People Use Iphone
 
Zendcon Performance Oci8
Zendcon Performance Oci8Zendcon Performance Oci8
Zendcon Performance Oci8
 
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
Applications Web En Entreprise Avec Ruby On Rails Benefices Et Limitations Gu...
 
Presentation Rails
Presentation RailsPresentation Rails
Presentation Rails
 

Recently uploaded

Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Cooling Tower SERD pH drop issue (11 April 2024) .pptx
Cooling Tower SERD pH drop issue (11 April 2024) .pptxCooling Tower SERD pH drop issue (11 April 2024) .pptx
Cooling Tower SERD pH drop issue (11 April 2024) .pptxmamansuratman0253
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Cooling Tower SERD pH drop issue (11 April 2024) .pptx
Cooling Tower SERD pH drop issue (11 April 2024) .pptxCooling Tower SERD pH drop issue (11 April 2024) .pptx
Cooling Tower SERD pH drop issue (11 April 2024) .pptx
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 

Running & Scaling Large Elasticsearch Clusters

  • 1. RUNNING & SCALING LARGE ELASTICSEARCH CLUSTERS FRED DE VILLAMIL, DIRECTOR OF INFRASTRUCTURE @FDEVILLAMIL OCTOBER 2017
  • 2. BACKGROUND • FRED DE VILLAMIL, 39 ANS, TEAM COFFEE @SYNTHESIO, • LINUX / (FREE)BSD USER SINCE 1996, • OPEN SOURCE CONTRIBUTOR SINCE 1998, • LOVES TENNIS, PHOTOGRAPHY, CUTE OTTERS, INAPPROPRIATE HUMOR AND ELASTICSEARCH CLUSTERS OF UNUSUAL SIZE. WRITES ABOUT ES AT HTTPS://THOUGHTS.T37.NET
  • 3. ELASTICSEARCH @SYNTHESIO • 8 production clusters, 600 hosts, 1.7PB storage, 37.5TB RAM, average 15k writes / s, 800 search /s, some inputs > 200MB. • Data nodes: 6 core Xeon E5v3, 64GB RAM, 4*800GB SSD RAID0. Sometimes bi Xeon E5-2687Wv4 12 core (160 watts!!!). • We agregate data from various cold storage and make them searchable in a giffy.
  • 4. AN ELASTICSEARCH CLUSTER OF UNUSUAL SIZE
  • 6. NEVER GONNA GIVE YOU UP • NEVER GONNA LET YOU DOWN, • NEVER GONNA RUN AROUND AND DESERT YOU, • NEVER GONNA MAKE YOU CRY, • NEVER GONNA SAY GOODBYE, • NEVER GONNA TELL A LIE & HURT YOU.
  • 7. AVOIDING DOWNTIME & SPLIT BRAINS • RUN AT LEAST 3 MASTER NODES INTO 3 DIFFERENT LOCATIONS. • NEVER RUN BULK QUERIES ON THE MASTER NODES. • ACTUALLY NEVER RUN ANYTHING BUT ADMINISTRATIVE TASKS ON THE MASTER NODES. • SPREAD YOUR DATA INTO 2 DIFFERENT LOCATION WITH AT LEAST A REPLICATION FACTOR OF 1 (1 PRIMARY, 1 REPLICA).
  • 8. RACK AWARENESS ALLOCATE A RACK_ID TO THE DATA NODES FOR EVEN REPLICATION. RESTART A WHOLE DATA CENTER @ONCE WITHOUT DOWNTIME.
  • 9. RACK AWARENESS + QUERY NODES == MAGIC ES PRIVILEGES THE DATA NODES WITH THE SAME RACK_ID AS THE QUERY. REDUCES LATENCY AND BALANCES THE LOAD.
  • 10. RACK AWARENESS + QUERY NODES + ZONE == MAGIC + FUN ADD ZONES INTO THE SAME RACK FOR EVEN REPLICATION WITH HIGHER FACTOR.
  • 11. USING ZONES FOR FUN & PROFIT ALLOWING EVEN REPLICATION WITH A HIGHER FACTOR WITHIN THE SAME RACK. ALLOWING MORE RESOURCES TO THE MOST FREQUENTLY ACCESSED INDEXES. …
  • 13. HOW ELASTICSEARCH USES THE MEMORY • Starts with allocating memory for Java heap. • The Java heap contains all Elasticsearch buffers and caches + a few other things. • Each Java thread maps a system thread: +128kB off heap. • Elected master uses 250kB to store each shard information inside the cluster.
  • 14. ALLOCATING MEMORY • Never allocate more than 31GB heap to avoid the compressed pointers issue. • Use 1/2 of your memory up to 31GB. • Feed your master and query nodes, the more the better (including CPU).
  • 15. MEMORY LOCK • Use memory_lock: true at startup. • Requires ulimit -l unlimited. • Allocates the whole heap at once. • Uses contiguous memory regions. • Avoids swapping (you should disable swap anyway).
  • 16. CHOSING THE RIGHT GARBAGE COLLECTOR • ES runs with CMS as a default garbage collector. • CMS was designed for heaps < 4GB. • Stop the world garbage collection last too long & blocks the cluster. • Solution: switching to G1GC (default in Java9, unsupported).
  • 17. CMS VS G1GC • CMS: SHARED CPU TIME WITH THE APPLICATION. “STOPS THE WORLD” WHEN TOO MANY MEMORY TO CLEAN UNTIL IT SENDS AN OUTOFMEMORYERROR. • G1GC: SHORT, MORE FREQUENT, PAUSES. WON’T STOP A NODE UNTIL IT LEAVES THE CLUSTER. • ELASTIC SAYS: DON’T USE G1GC FOR REASONS, SO READ THE DOC.
  • 18. G1GC OPTIONS +USEG1GC: ACTIVATES G1GC MAXGCPAUSEMILLIS: TARGET FOR MAX GC PAUSE TIME. GCPAUSEINTERVALMILLIS:TARGET FOR COLLECTION TIME SPACE INITIATINGHEAPOCCUPANCYPERCENT: WHEN TO START COLLECTING?
  • 19. CHO0SING THE RIGHT STORAGE • MMAPFS : MAPS LUCENE FILES ON THE VIRTUAL MEMORY USING MMAP. NEEDS AS MUCH MEMORY AS THE FILE BEING MAPPED TO AVOID ISSUES. • NIOFS : APPLIES A SHARED LOCK ON LUCENE FILES AND RELIES ON VFS CACHE.
  • 20. BUFFERS AND CACHES • ELASTICSEARCH HAS MULTIPLE CACHES & BUFFERS, WITH DEFAULT VALUES, KNOW THEM! • BUFFERS + CACHE MUST BE < TOTAL JAVA HEAP (OBVIOUS BUT…). • AUTOMATED EVICTION ON THE CACHE, BUT FORCING IT CAN SAVE YOUR LIFE WITH A SMALL OVERHEAD. • IF YOU HAVE OOM ISSUES, DISABLE THE CACHES! • FROM A USER POV, CIRCUIT BREAKERS ARE A NO GO!
  • 22. INDEX DESIGN • VERSION YOUR INDEX BY MAPPING: 1_*, 2_* ETC. • THE MORE SHARDS, THE BETTER ELASTICITY, BUT THE MORE CPU AND MEMORY USED ON THE MASTERS. • PROVISIONNING 10GB PER SHARDS ALLOWS A FASTER RECOVERY & REALLOCATION.
  • 23. REPLICATION TRICKS • NUMBER OF REPLICAS MUST BE 0 OR ODD. CONSISTENCY QUORUM: INT( (PRIMARY + NUMBER_OF_REPLICAS) / 2 ) + 1. • RAISE THE REPLICATION FACTOR TO SCALE FOR READING UP TO 100% OF THE DATA / DATA NODE.
  • 24. ALIASES • ACCESS MULTIPLE INDICES AT ONCE. • READ MULTIPLE, WRITE ONLY ONE. EXAMPLE ON TIMESTAMPED INDICES: "18_20171020": { "aliases": { "2017": {}, "201710": {}, "20171020": {} } } "18_20171021": { "aliases": { "2017": {}, "201710": {}, "20171021": {} } } Queries: /2017/_search /201710/_search AFTER A MAPPING CHANGE & REINDEX, CHANGE THE ALIAS:
  • 25. ROLLOVER • CREATE A NEW INDEX WHEN TOO OLD OR TOO BIG. • SUPPORT DATE MATH: DAILY INDEX CREATION. • USE ALIASES TO QUERY ALL ROLLOVERED INDEXES. PUT "logs-000001" { "aliases": { "logs": {} } } POST /logs/_rollover { "conditions": { "max_docs": 10000000 } }
  • 27. CONFIGURATION CHANGES • PREFER CONFIGURATION FILE UPDATES TO API CALL FOR PERMANENT CHANGES. • VERSION YOUR CONFIGURATION CHANGES SO YOU CAN ROLLBACK, ES REQUIRES LOTS OF FINE TUNING. • WHEN USING _SETTINGS API, PREFER TRANSIENT TO PERSISTENT, THEY’RE EASIER TO GET RID OF.
  • 28. RECONFIGURING THE WHOLE CLUSTER LOCK SHARD REALLOCATION & RECOVERY: "cluster.routing.allocation.enable" : "none" OPTIMIZE FOR RECOVERY: "cluster.routing.allocation.node_initial_primaries_recoveries": 50 "indices.recovery.max_bytes_per_sec": "2048mb" RESTART A FULL RACK, WAIT FOR NODES TO COME
  • 29. THE REINDEX API • IN CLUSTER AND CLUSTER TO CLUSTER REINDEX API. • ALLOWS CROSS VERSION INDEXING: 1.7 TO 5.1… • SLICED SCROLLS ONLY AVAILABLE STARTING 6.0. • ACCEPT ES QUERIES TO FILTER THE DATA TO REINDEX. • MERGE MULTIPLE INDEXES INTO 1.
  • 30. BULK INDEXING TRICKS LIMIT REBALANCE: "cluster.routing.allocation.cluster_concurrent_rebalance": 1 "cluster.routing.allocation.balance.shard": "0.15f" "cluster.routing.allocation.balance.threshold": "10.0f" DISABLE REFRESH: "index.refresh_interval:" "0" NO REPLICA: "index.number_of_replicas:" "0" // having replica index n times in Lucene, adding one just "rsync" the data. ALLOCATE ON DEDICATED HARDWARE:
  • 31. OPTIMIZING FOR SPACE & PERFORMANCES • LUCENE SEGMENTS ARE IMMUTABLE, THE MORE YOU WRITE, THE MORE SEGMENTS YOU GET. • DELETING DOCUMENTS DOES COPY ON WRITE SO NO REAL DELETE. index.merge.scheduler.max_thread_count: default CPU/2 with min 4 POST /_force_merge?only_expunge_deletes: faster, only merge segments with deleted POST /_force_merge?max_num_segments: don’t use on indexes you write on! WARNING: _FORCE_MERGE HAS A COST IN CPU AND I/OS.
  • 32. MINOR VERSION UPGRADES • CHECK YOUR PLUGINS COMPATIBILITY, PLUGINS MUST BE COMPILED FOR YOUR MINOR VERSION. • START UPGRADING THE MASTER NODES. • UPGRADE THE DATA NODES ON A WHOLE RACK AT ONCE.
  • 33. OS LEVEL UPGRADES • ENSURE THE WHOLE CLUSTER RUNS THE SAME JAVA VERSION. • WHEN UPGRADING JAVA, CHECK IF YOU DON’T HAVE TO UPGRADE THE KERNEL. • PER NODE JAVA / KERNEL VERSION AVAILABLE IN THE _STATS API.
  • 35. CAPTAIN OBVIOUS, YOU’RE MY ONLY HOPE! • GOOD MONITORING IS BUSINESS ORIENTED MONITORING. • GOOD ALERTING IS ACTIONABLE ALERTING. • DON’T MONITORE THE CLUSTER ONLY, BUT THE WHOLE PROCESSING CHAIN. • USELESS METRICS ARE USELESS. • LOSING A DATACENTER: OK. LOSING DATA: NOT OK!
  • 36. MONITORING TOOLING • ELASTICSEARCH X- PACK, • GRAFANA…
  • 37. LIFE, DEATH & _CLUSTER/HEALTH • A RED CLUSTER MEANS AT LEAST 1 INDEX HAS MISSING DATA. DON’T PANIC! • USING LEVEL={INDEX,SHARD} AND AN INDEX ID PROVIDES SPECIFIC INFORMATION. • LOTS OF PENDING TASKS MEANS YOUR CLUSTER IS UNDER HEAVY LOAD AND SOME NODES CAN’T PROCESS THEM FAST ENOUGH. • LONG WAITING TASKS MEANS YOU HAVE A CAPACITY PLANNING PROBLEM.
  • 38. USE THE _CAT API • PROVIDES GENERAL INFORMATION ABOUT YOUR NODES, SHARDS, INDICES AND THREAD POOLS. • HIT THE WHOLE CLUSTER, WHEN IT TIMEOUTS YOU’VE PROBABLY HAVING A NODE STUCK IN GARBAGE COLLECTION.
  • 39. MONITORING AT THE CLUSTER LEVEL • USE THE _STATS API FOR PRECISE INFORMATION. • MONITORE THE SHARDS REALLOCATION, TOO MANY MEANS A DESIGN PROBLEM. • MONITORE THE WRITES AND CLUSTER WIDE, IF THEY FALL TO 0 AND IT’S UNUSUAL, A NODE IS STUCK IN GC.
  • 40. MONITORING AT THE NODE LEVEL • USE THE _NODES/{NODE}, _NODES/{NODE}/STATS AND _CAT/THREAD_POOL API. • THE GARBAGE COLLECTION DURATION & FREQUENCY IS A GOOD METRIC OF YOUR NODE HEALTH. • CACHE AND BUFFERS ARE MONITORED ON A NODE LEVEL. • MONITORING I/OS, SPACE, OPEN FILES & CPU IS CRITICAL.
  • 41. MONITORING AT THE INDEX LEVEL • USE THE {INDEX}/_STATS API. • MONITORE THE DOCUMENTS / SHARD RATIO. • MONITORE THE MERGES, QUERY TIME. • TOO MANY EVICTIONS MEANS YOU HAVE A CACHE CONFIGURATION LEVEL.
  • 43. WHAT’S REALLY GOING ON IN YOUR CLUSTER? • THE _NODES/{NODE}/HOT_THREADS API TELLS WHAT HAPPENS ON THE HOST. • THE ELECTED MASTER NODES TELLS YOU MOST THING YOU NEED TO KNOW. • ENABLE THE SLOW LOGS TO UNDERSTAND YOUR BOTTLENECK & OPTIMIZE THE QUERIES. DISABLE THE SLOW LOGS WHEN YOU’RE DONE!!! • WHEN NOT ENOUGH, MEMORY PROFILING IS YOUR FRIEND.
  • 44. MEMORY PROFILING • LIVE MEMORY OR HPROF FILE AFTER A CRASH. • ALLOWS YOU TO TO KNOW WHAT IS / WAS IN YOUR BUFFERS AND CACHES. • YOURKIT JAVA PROFILER AS A TOOL.
  • 45. TRACING • KNOW WHAT’S REALLY HAPPENING IN YOUR JVM. • LINUX 4.X PROVIDES GREAT PERF TOOLS, LINUX 4.9 EVEN BETTER: • LINUX-PERF, • JAVA PERF MAP. • VECTOR BY NETFLIX (NOT VEKTOR THE TRASH METAL BAND).