Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
RUNNING & SCALING
LARGE ELASTICSEARCH
CLUSTERS
FRED DE VILLAMIL, DIRECTOR OF
INFRASTRUCTURE
@FDEVILLAMIL
OCTOBER 2017
BACKGROUND
• FRED DE VILLAMIL, 39 ANS, TEAM
COFFEE @SYNTHESIO,
• LINUX / (FREE)BSD USER SINCE 1996,
• OPEN SOURCE CONTRIBU...
ELASTICSEARCH @SYNTHESIO
• 8 production clusters, 600 hosts, 1.7PB storage, 37.5TB
RAM, average 15k writes / s, 800 search...
AN ELASTICSEARCH CLUSTER OF UNUSUAL
SIZE
ENSURING HIGH
AVAILABILITY
NEVER GONNA GIVE YOU UP
• NEVER GONNA LET YOU DOWN,
• NEVER GONNA RUN AROUND AND
DESERT YOU,
• NEVER GONNA MAKE YOU CRY,
•...
AVOIDING DOWNTIME & SPLIT BRAINS
• RUN AT LEAST 3 MASTER NODES INTO 3 DIFFERENT
LOCATIONS.
• NEVER RUN BULK QUERIES ON THE...
RACK AWARENESS
ALLOCATE A
RACK_ID TO THE
DATA NODES FOR
EVEN REPLICATION.
RESTART A WHOLE
DATA CENTER
@ONCE WITHOUT
DOWNTI...
RACK AWARENESS + QUERY NODES ==
MAGIC
ES PRIVILEGES THE
DATA NODES WITH
THE SAME RACK_ID
AS THE QUERY.
REDUCES LATENCY
AND...
RACK AWARENESS + QUERY NODES + ZONE ==
MAGIC + FUN
ADD ZONES INTO THE
SAME RACK FOR
EVEN REPLICATION
WITH HIGHER
FACTOR.
USING ZONES FOR FUN & PROFIT
ALLOWING EVEN REPLICATION
WITH A HIGHER FACTOR WITHIN
THE SAME RACK.
ALLOWING MORE RESOURCES
...
AVOIDING MEMORY
NIGHTMARE
HOW ELASTICSEARCH USES THE MEMORY
• Starts with allocating memory for Java heap.
• The Java heap contains all Elasticsearc...
ALLOCATING MEMORY
• Never allocate more than 31GB
heap to avoid the compressed
pointers issue.
• Use 1/2 of your memory up...
MEMORY LOCK
• Use memory_lock: true at
startup.
• Requires ulimit -l
unlimited.
• Allocates the whole heap at once.
• Uses...
CHOSING THE RIGHT GARBAGE COLLECTOR
• ES runs with CMS as a default
garbage collector.
• CMS was designed for heaps <
4GB....
CMS VS G1GC
• CMS: SHARED CPU TIME WITH THE APPLICATION.
“STOPS THE WORLD” WHEN TOO MANY MEMORY TO
CLEAN UNTIL IT SENDS AN...
G1GC OPTIONS
+USEG1GC: ACTIVATES G1GC
MAXGCPAUSEMILLIS: TARGET FOR MAX GC PAUSE
TIME.
GCPAUSEINTERVALMILLIS:TARGET FOR COL...
CHO0SING THE RIGHT STORAGE
• MMAPFS : MAPS LUCENE FILES ON
THE VIRTUAL MEMORY USING
MMAP. NEEDS AS MUCH MEMORY
AS THE FILE...
BUFFERS AND CACHES
• ELASTICSEARCH HAS MULTIPLE CACHES & BUFFERS, WITH DEFAULT VALUES,
KNOW THEM!
• BUFFERS + CACHE MUST B...
MANAGING LARGE INDEXES
INDEX DESIGN
• VERSION YOUR INDEX BY MAPPING: 1_*, 2_* ETC.
• THE MORE SHARDS, THE BETTER ELASTICITY, BUT
THE MORE CPU AND...
REPLICATION TRICKS
• NUMBER OF REPLICAS MUST BE 0 OR ODD.
CONSISTENCY QUORUM: INT( (PRIMARY +
NUMBER_OF_REPLICAS) / 2 ) + ...
ALIASES
• ACCESS MULTIPLE INDICES AT ONCE.
• READ MULTIPLE, WRITE ONLY ONE.
EXAMPLE ON TIMESTAMPED INDICES:
"18_20171020":...
ROLLOVER
• CREATE A NEW INDEX WHEN TOO OLD OR
TOO BIG.
• SUPPORT DATE MATH: DAILY INDEX
CREATION.
• USE ALIASES TO QUERY A...
DAILY OPERATIONS
CONFIGURATION CHANGES
• PREFER CONFIGURATION FILE UPDATES TO API CALL FOR
PERMANENT CHANGES.
• VERSION YOUR CONFIGURATION ...
RECONFIGURING THE WHOLE CLUSTER
LOCK SHARD REALLOCATION & RECOVERY:
"cluster.routing.allocation.enable" : "none"
OPTIMIZE ...
THE REINDEX API
• IN CLUSTER AND CLUSTER TO CLUSTER REINDEX API.
• ALLOWS CROSS VERSION INDEXING: 1.7 TO 5.1…
• SLICED SCR...
BULK INDEXING TRICKS
LIMIT REBALANCE:
"cluster.routing.allocation.cluster_concurrent_rebalance": 1
"cluster.routing.alloca...
OPTIMIZING FOR SPACE & PERFORMANCES
• LUCENE SEGMENTS ARE IMMUTABLE, THE MORE YOU
WRITE, THE MORE SEGMENTS YOU GET.
• DELE...
MINOR VERSION UPGRADES
• CHECK YOUR PLUGINS COMPATIBILITY, PLUGINS
MUST BE COMPILED FOR YOUR MINOR VERSION.
• START UPGRAD...
OS LEVEL UPGRADES
• ENSURE THE WHOLE CLUSTER RUNS THE SAME JAVA
VERSION.
• WHEN UPGRADING JAVA, CHECK IF YOU DON’T HAVE
TO...
MONITORING
CAPTAIN OBVIOUS, YOU’RE MY ONLY HOPE!
• GOOD MONITORING IS BUSINESS ORIENTED MONITORING.
• GOOD ALERTING IS ACTIONABLE ALE...
MONITORING TOOLING
• ELASTICSEARCH X-
PACK,
• GRAFANA…
LIFE, DEATH & _CLUSTER/HEALTH
• A RED CLUSTER MEANS AT LEAST 1
INDEX HAS MISSING DATA. DON’T
PANIC!
• USING LEVEL={INDEX,S...
USE THE _CAT API
• PROVIDES GENERAL INFORMATION
ABOUT YOUR NODES, SHARDS,
INDICES AND THREAD POOLS.
• HIT THE WHOLE CLUSTE...
MONITORING AT THE CLUSTER LEVEL
• USE THE _STATS API FOR PRECISE INFORMATION.
• MONITORE THE SHARDS REALLOCATION, TOO MANY...
MONITORING AT THE NODE LEVEL
• USE THE _NODES/{NODE}, _NODES/{NODE}/STATS
AND _CAT/THREAD_POOL API.
• THE GARBAGE COLLECTI...
MONITORING AT THE INDEX LEVEL
• USE THE {INDEX}/_STATS API.
• MONITORE THE DOCUMENTS / SHARD RATIO.
• MONITORE THE MERGES,...
TROUBLESHOOTING
WHAT’S REALLY GOING ON IN YOUR CLUSTER?
• THE _NODES/{NODE}/HOT_THREADS API TELLS WHAT HAPPENS ON
THE HOST.
• THE ELECTED ...
MEMORY PROFILING
• LIVE MEMORY OR HPROF FILE AFTER A CRASH.
• ALLOWS YOU TO TO KNOW WHAT IS / WAS IN YOUR
BUFFERS AND CACH...
TRACING
• KNOW WHAT’S REALLY HAPPENING IN YOUR JVM.
• LINUX 4.X PROVIDES GREAT PERF TOOLS, LINUX 4.9 EVEN
BETTER:
• LINUX-...
QUESTIONS
?
@FDEVILLAMI
L
@SYNTHESIO
SLIDES: HTTP://BIT.DO/ELASTICSEARCH-SYSADMIN-201
Upcoming SlideShare
Loading in …5
×

Running & Scaling Large Elasticsearch Clusters

9,602 views

Published on

A few tips and tricks on operating 100TB+ Elasticsearch clusters from our experience at Synthesio. Covers Elasticsearch 5+

Published in: Engineering
  • You have to choose carefully. ⇒ www.WritePaper.info ⇐ offers a professional writing service. I highly recommend them. The papers are delivered on time and customers are their first priority. This is their website: ⇒ www.WritePaper.info ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thank you for this great presentation. You were indexing inputs &gt; 200MB. Do you store these inputs in the _source fields too? Have you met issue (CPU, mem..) with large source size?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Running & Scaling Large Elasticsearch Clusters

  1. 1. RUNNING & SCALING LARGE ELASTICSEARCH CLUSTERS FRED DE VILLAMIL, DIRECTOR OF INFRASTRUCTURE @FDEVILLAMIL OCTOBER 2017
  2. 2. BACKGROUND • FRED DE VILLAMIL, 39 ANS, TEAM COFFEE @SYNTHESIO, • LINUX / (FREE)BSD USER SINCE 1996, • OPEN SOURCE CONTRIBUTOR SINCE 1998, • LOVES TENNIS, PHOTOGRAPHY, CUTE OTTERS, INAPPROPRIATE HUMOR AND ELASTICSEARCH CLUSTERS OF UNUSUAL SIZE. WRITES ABOUT ES AT HTTPS://THOUGHTS.T37.NET
  3. 3. ELASTICSEARCH @SYNTHESIO • 8 production clusters, 600 hosts, 1.7PB storage, 37.5TB RAM, average 15k writes / s, 800 search /s, some inputs > 200MB. • Data nodes: 6 core Xeon E5v3, 64GB RAM, 4*800GB SSD RAID0. Sometimes bi Xeon E5-2687Wv4 12 core (160 watts!!!). • We agregate data from various cold storage and make them searchable in a giffy.
  4. 4. AN ELASTICSEARCH CLUSTER OF UNUSUAL SIZE
  5. 5. ENSURING HIGH AVAILABILITY
  6. 6. NEVER GONNA GIVE YOU UP • NEVER GONNA LET YOU DOWN, • NEVER GONNA RUN AROUND AND DESERT YOU, • NEVER GONNA MAKE YOU CRY, • NEVER GONNA SAY GOODBYE, • NEVER GONNA TELL A LIE & HURT YOU.
  7. 7. AVOIDING DOWNTIME & SPLIT BRAINS • RUN AT LEAST 3 MASTER NODES INTO 3 DIFFERENT LOCATIONS. • NEVER RUN BULK QUERIES ON THE MASTER NODES. • ACTUALLY NEVER RUN ANYTHING BUT ADMINISTRATIVE TASKS ON THE MASTER NODES. • SPREAD YOUR DATA INTO 2 DIFFERENT LOCATION WITH AT LEAST A REPLICATION FACTOR OF 1 (1 PRIMARY, 1 REPLICA).
  8. 8. RACK AWARENESS ALLOCATE A RACK_ID TO THE DATA NODES FOR EVEN REPLICATION. RESTART A WHOLE DATA CENTER @ONCE WITHOUT DOWNTIME.
  9. 9. RACK AWARENESS + QUERY NODES == MAGIC ES PRIVILEGES THE DATA NODES WITH THE SAME RACK_ID AS THE QUERY. REDUCES LATENCY AND BALANCES THE LOAD.
  10. 10. RACK AWARENESS + QUERY NODES + ZONE == MAGIC + FUN ADD ZONES INTO THE SAME RACK FOR EVEN REPLICATION WITH HIGHER FACTOR.
  11. 11. USING ZONES FOR FUN & PROFIT ALLOWING EVEN REPLICATION WITH A HIGHER FACTOR WITHIN THE SAME RACK. ALLOWING MORE RESOURCES TO THE MOST FREQUENTLY ACCESSED INDEXES. …
  12. 12. AVOIDING MEMORY NIGHTMARE
  13. 13. HOW ELASTICSEARCH USES THE MEMORY • Starts with allocating memory for Java heap. • The Java heap contains all Elasticsearch buffers and caches + a few other things. • Each Java thread maps a system thread: +128kB off heap. • Elected master uses 250kB to store each shard information inside the cluster.
  14. 14. ALLOCATING MEMORY • Never allocate more than 31GB heap to avoid the compressed pointers issue. • Use 1/2 of your memory up to 31GB. • Feed your master and query nodes, the more the better (including CPU).
  15. 15. MEMORY LOCK • Use memory_lock: true at startup. • Requires ulimit -l unlimited. • Allocates the whole heap at once. • Uses contiguous memory regions. • Avoids swapping (you should disable swap anyway).
  16. 16. CHOSING THE RIGHT GARBAGE COLLECTOR • ES runs with CMS as a default garbage collector. • CMS was designed for heaps < 4GB. • Stop the world garbage collection last too long & blocks the cluster. • Solution: switching to G1GC (default in Java9, unsupported).
  17. 17. CMS VS G1GC • CMS: SHARED CPU TIME WITH THE APPLICATION. “STOPS THE WORLD” WHEN TOO MANY MEMORY TO CLEAN UNTIL IT SENDS AN OUTOFMEMORYERROR. • G1GC: SHORT, MORE FREQUENT, PAUSES. WON’T STOP A NODE UNTIL IT LEAVES THE CLUSTER. • ELASTIC SAYS: DON’T USE G1GC FOR REASONS, SO READ THE DOC.
  18. 18. G1GC OPTIONS +USEG1GC: ACTIVATES G1GC MAXGCPAUSEMILLIS: TARGET FOR MAX GC PAUSE TIME. GCPAUSEINTERVALMILLIS:TARGET FOR COLLECTION TIME SPACE INITIATINGHEAPOCCUPANCYPERCENT: WHEN TO START COLLECTING?
  19. 19. CHO0SING THE RIGHT STORAGE • MMAPFS : MAPS LUCENE FILES ON THE VIRTUAL MEMORY USING MMAP. NEEDS AS MUCH MEMORY AS THE FILE BEING MAPPED TO AVOID ISSUES. • NIOFS : APPLIES A SHARED LOCK ON LUCENE FILES AND RELIES ON VFS CACHE.
  20. 20. BUFFERS AND CACHES • ELASTICSEARCH HAS MULTIPLE CACHES & BUFFERS, WITH DEFAULT VALUES, KNOW THEM! • BUFFERS + CACHE MUST BE < TOTAL JAVA HEAP (OBVIOUS BUT…). • AUTOMATED EVICTION ON THE CACHE, BUT FORCING IT CAN SAVE YOUR LIFE WITH A SMALL OVERHEAD. • IF YOU HAVE OOM ISSUES, DISABLE THE CACHES! • FROM A USER POV, CIRCUIT BREAKERS ARE A NO GO!
  21. 21. MANAGING LARGE INDEXES
  22. 22. INDEX DESIGN • VERSION YOUR INDEX BY MAPPING: 1_*, 2_* ETC. • THE MORE SHARDS, THE BETTER ELASTICITY, BUT THE MORE CPU AND MEMORY USED ON THE MASTERS. • PROVISIONNING 10GB PER SHARDS ALLOWS A FASTER RECOVERY & REALLOCATION.
  23. 23. REPLICATION TRICKS • NUMBER OF REPLICAS MUST BE 0 OR ODD. CONSISTENCY QUORUM: INT( (PRIMARY + NUMBER_OF_REPLICAS) / 2 ) + 1. • RAISE THE REPLICATION FACTOR TO SCALE FOR READING UP TO 100% OF THE DATA / DATA NODE.
  24. 24. ALIASES • ACCESS MULTIPLE INDICES AT ONCE. • READ MULTIPLE, WRITE ONLY ONE. EXAMPLE ON TIMESTAMPED INDICES: "18_20171020": { "aliases": { "2017": {}, "201710": {}, "20171020": {} } } "18_20171021": { "aliases": { "2017": {}, "201710": {}, "20171021": {} } } Queries: /2017/_search /201710/_search AFTER A MAPPING CHANGE & REINDEX, CHANGE THE ALIAS:
  25. 25. ROLLOVER • CREATE A NEW INDEX WHEN TOO OLD OR TOO BIG. • SUPPORT DATE MATH: DAILY INDEX CREATION. • USE ALIASES TO QUERY ALL ROLLOVERED INDEXES. PUT "logs-000001" { "aliases": { "logs": {} } } POST /logs/_rollover { "conditions": { "max_docs": 10000000 } }
  26. 26. DAILY OPERATIONS
  27. 27. CONFIGURATION CHANGES • PREFER CONFIGURATION FILE UPDATES TO API CALL FOR PERMANENT CHANGES. • VERSION YOUR CONFIGURATION CHANGES SO YOU CAN ROLLBACK, ES REQUIRES LOTS OF FINE TUNING. • WHEN USING _SETTINGS API, PREFER TRANSIENT TO PERSISTENT, THEY’RE EASIER TO GET RID OF.
  28. 28. RECONFIGURING THE WHOLE CLUSTER LOCK SHARD REALLOCATION & RECOVERY: "cluster.routing.allocation.enable" : "none" OPTIMIZE FOR RECOVERY: "cluster.routing.allocation.node_initial_primaries_recoveries": 50 "indices.recovery.max_bytes_per_sec": "2048mb" RESTART A FULL RACK, WAIT FOR NODES TO COME
  29. 29. THE REINDEX API • IN CLUSTER AND CLUSTER TO CLUSTER REINDEX API. • ALLOWS CROSS VERSION INDEXING: 1.7 TO 5.1… • SLICED SCROLLS ONLY AVAILABLE STARTING 6.0. • ACCEPT ES QUERIES TO FILTER THE DATA TO REINDEX. • MERGE MULTIPLE INDEXES INTO 1.
  30. 30. BULK INDEXING TRICKS LIMIT REBALANCE: "cluster.routing.allocation.cluster_concurrent_rebalance": 1 "cluster.routing.allocation.balance.shard": "0.15f" "cluster.routing.allocation.balance.threshold": "10.0f" DISABLE REFRESH: "index.refresh_interval:" "0" NO REPLICA: "index.number_of_replicas:" "0" // having replica index n times in Lucene, adding one just "rsync" the data. ALLOCATE ON DEDICATED HARDWARE:
  31. 31. OPTIMIZING FOR SPACE & PERFORMANCES • LUCENE SEGMENTS ARE IMMUTABLE, THE MORE YOU WRITE, THE MORE SEGMENTS YOU GET. • DELETING DOCUMENTS DOES COPY ON WRITE SO NO REAL DELETE. index.merge.scheduler.max_thread_count: default CPU/2 with min 4 POST /_force_merge?only_expunge_deletes: faster, only merge segments with deleted POST /_force_merge?max_num_segments: don’t use on indexes you write on! WARNING: _FORCE_MERGE HAS A COST IN CPU AND I/OS.
  32. 32. MINOR VERSION UPGRADES • CHECK YOUR PLUGINS COMPATIBILITY, PLUGINS MUST BE COMPILED FOR YOUR MINOR VERSION. • START UPGRADING THE MASTER NODES. • UPGRADE THE DATA NODES ON A WHOLE RACK AT ONCE.
  33. 33. OS LEVEL UPGRADES • ENSURE THE WHOLE CLUSTER RUNS THE SAME JAVA VERSION. • WHEN UPGRADING JAVA, CHECK IF YOU DON’T HAVE TO UPGRADE THE KERNEL. • PER NODE JAVA / KERNEL VERSION AVAILABLE IN THE _STATS API.
  34. 34. MONITORING
  35. 35. CAPTAIN OBVIOUS, YOU’RE MY ONLY HOPE! • GOOD MONITORING IS BUSINESS ORIENTED MONITORING. • GOOD ALERTING IS ACTIONABLE ALERTING. • DON’T MONITORE THE CLUSTER ONLY, BUT THE WHOLE PROCESSING CHAIN. • USELESS METRICS ARE USELESS. • LOSING A DATACENTER: OK. LOSING DATA: NOT OK!
  36. 36. MONITORING TOOLING • ELASTICSEARCH X- PACK, • GRAFANA…
  37. 37. LIFE, DEATH & _CLUSTER/HEALTH • A RED CLUSTER MEANS AT LEAST 1 INDEX HAS MISSING DATA. DON’T PANIC! • USING LEVEL={INDEX,SHARD} AND AN INDEX ID PROVIDES SPECIFIC INFORMATION. • LOTS OF PENDING TASKS MEANS YOUR CLUSTER IS UNDER HEAVY LOAD AND SOME NODES CAN’T PROCESS THEM FAST ENOUGH. • LONG WAITING TASKS MEANS YOU HAVE A CAPACITY PLANNING PROBLEM.
  38. 38. USE THE _CAT API • PROVIDES GENERAL INFORMATION ABOUT YOUR NODES, SHARDS, INDICES AND THREAD POOLS. • HIT THE WHOLE CLUSTER, WHEN IT TIMEOUTS YOU’VE PROBABLY HAVING A NODE STUCK IN GARBAGE COLLECTION.
  39. 39. MONITORING AT THE CLUSTER LEVEL • USE THE _STATS API FOR PRECISE INFORMATION. • MONITORE THE SHARDS REALLOCATION, TOO MANY MEANS A DESIGN PROBLEM. • MONITORE THE WRITES AND CLUSTER WIDE, IF THEY FALL TO 0 AND IT’S UNUSUAL, A NODE IS STUCK IN GC.
  40. 40. MONITORING AT THE NODE LEVEL • USE THE _NODES/{NODE}, _NODES/{NODE}/STATS AND _CAT/THREAD_POOL API. • THE GARBAGE COLLECTION DURATION & FREQUENCY IS A GOOD METRIC OF YOUR NODE HEALTH. • CACHE AND BUFFERS ARE MONITORED ON A NODE LEVEL. • MONITORING I/OS, SPACE, OPEN FILES & CPU IS CRITICAL.
  41. 41. MONITORING AT THE INDEX LEVEL • USE THE {INDEX}/_STATS API. • MONITORE THE DOCUMENTS / SHARD RATIO. • MONITORE THE MERGES, QUERY TIME. • TOO MANY EVICTIONS MEANS YOU HAVE A CACHE CONFIGURATION LEVEL.
  42. 42. TROUBLESHOOTING
  43. 43. WHAT’S REALLY GOING ON IN YOUR CLUSTER? • THE _NODES/{NODE}/HOT_THREADS API TELLS WHAT HAPPENS ON THE HOST. • THE ELECTED MASTER NODES TELLS YOU MOST THING YOU NEED TO KNOW. • ENABLE THE SLOW LOGS TO UNDERSTAND YOUR BOTTLENECK & OPTIMIZE THE QUERIES. DISABLE THE SLOW LOGS WHEN YOU’RE DONE!!! • WHEN NOT ENOUGH, MEMORY PROFILING IS YOUR FRIEND.
  44. 44. MEMORY PROFILING • LIVE MEMORY OR HPROF FILE AFTER A CRASH. • ALLOWS YOU TO TO KNOW WHAT IS / WAS IN YOUR BUFFERS AND CACHES. • YOURKIT JAVA PROFILER AS A TOOL.
  45. 45. TRACING • KNOW WHAT’S REALLY HAPPENING IN YOUR JVM. • LINUX 4.X PROVIDES GREAT PERF TOOLS, LINUX 4.9 EVEN BETTER: • LINUX-PERF, • JAVA PERF MAP. • VECTOR BY NETFLIX (NOT VEKTOR THE TRASH METAL BAND).
  46. 46. QUESTIONS ? @FDEVILLAMI L @SYNTHESIO SLIDES: HTTP://BIT.DO/ELASTICSEARCH-SYSADMIN-201

×