Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Elasticsearch 101 - Cluster setup and tuning

1,090 views

Published on

Setting up, configuring and tuning your ElasticSearch cluster. From JVM setting to useful plugins

Published in: Technology
  • Be the first to comment

Elasticsearch 101 - Cluster setup and tuning

  1. 1. ElasticSearch 101 Setting up, configuring and tuning your ElasticSearch cluster
  2. 2. Our ElasticSearch setup Client node Client node Data node Data node Data node Apps ● 8 cores, 30GB RAM, 2TB EBS ● Running in Docker ● Apache Mesos / Marathon ● Dedicated DN machines
  3. 3. https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0 https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1195474 It’s not a database ● You don’t get the same guarantees as from databases (ACID) ● Writes acknowledged before flushed to persistent storage ● Network partitions can lead to data loss: - Long GC pauses - Kernel bugs (!) ● Deletes take longer
  4. 4. Rolling your own ES Cluster (1) ● Name your cluster ● Disable multicast for discovery ● Set minimum master nodes (N/2+1) => Split-Brain!
  5. 5. ● Check open file descriptors limit ● Disable swap (or mlockall) ● Configure gateway settings ○ recover_after_time ○ expected_nodes ● Avoid tribe nodes Rolling your own ES Cluster (2)
  6. 6. Exhausting available JVM heap mem Nodes will become unresponsive!
  7. 7. Memory requirements ● Bottom peaks of the used JVM heap after the GC run mark the required memory (add safety buffer) ● At least 4GB per node ● 50% for JVM, 50% for FS cache / Lucene
  8. 8. JVM settings ● Define heap memory (ES_HEAP_SIZE) ● Don’t tune JVM settings ● Don’t tune thread pool ■ In some case you might have to ■ Increasing will introduce memory pressure ● Don’t use G1 garbage collector
  9. 9. Indexing data ● Define data schemas and types ≠ Schemaless ○ Default: string mapping = analyzed = memory costly ○ Understand tokenizers and analyzers ● Prefer bulk indexing ● Refresh interval ● Time based indexes for log data
  10. 10. Querying for data ● Use filters as much as possible ● `Scan & scroll` for dumping large data, e.g. when reindexing ● Transform data during indexing if possible ● ORMs make debugging a pain. https://www.found.no/foundation/optimizing-elasticsearch-searches/ https://abhishek376.wordpress.com/2014/11/24/how-we-optimized-100-sec-elasticsearch-queries-to-be-under-a-sub-second/
  11. 11. Avoid high cardinality fields ● Aggregation => field data ● Often major consumer of heap memory ● Use doc values (on disk field data) ● Avoid aggregation on analyzed fields
  12. 12. More things to watch out ● Cluster health (duh!) ● Field data cache size ● Filter cache eviction ● Slow queries ● GC pauses ● Security settings ○ no authentication by default ● Backup
  13. 13. Tooling ● Use official SDKs ● For Go we use ElastiGo (not so great) ● Elastic HQ ● Inquisitor ● Sense

×