Elasticsearch in production


Published on

Video available at http://www.youtube.com/watch?v=gkdfNl0WL-A

Original slides at http://presentations.found.no/berlin-buzzwords-2013/

This talk covers some of the lessons we've learned from securing and herding hundreds of Elasticsearch clusters. It is applicable whether you operate Elasticsearch in your own infrastructure, in the cloud, or if you're a developer who wants a better understanding of Elasticsearch's various failure modes.

Elasticsearch easily lets you develop amazing things, and it has gone to great lengths to make Lucene's features readily available in a distributed setting. However, when it comes to running Elasticsearch in production, you still have a fairly complicated system on your hands: a system with high expectations on network stability, a huge appetite for memory, and a system that assumes all users are trustworthy.

Instead of delving deeply into a few specifics, we give a brief overview of problems you are likely to run into and suggested solutions to these problems. We cover topics that are applicable to both developers and users with Elasticsearch clusters of every shape and size – with an emphasis on resiliency and security.

Basic familiarity with Elasticsearch is assumed.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Elasticsearch in production

  1. 1. Elasticsearch in productionAlex Brasetvik@alexbrasetvik
  2. 2. How marketing thinks our users feel
  3. 3. How we developers sometimes feel
  4. 4. Who?Co-founder of Found AS7+ years of search, 2+ ElasticsearchWe manage hundreds of Elasticsearchclusters… on Amazons cloud
  5. 5. AgendaMemory (and stability)Security (and multi-tenancy)Networking (and reliability)Client (and resiliency)
  6. 6. MemorySearch engines crave memoryCaches, caches, cachesField- and filter cachesPage cacheIndex building
  7. 7. PostgreSQLVerifies resource usageSafe >>> fastUses disk if necessary
  8. 8. Elasticsearch trusts youBuilt for speedItll jump if you ask it toWhat could possibly go wrong?
  9. 9. OutOfMemoryErrorWoah thereI ate all the memoriesYour cluster may or may not work any more
  10. 10. May or may not work?What else was happening at the time?Corrupt cluster state, crashed Netty, …In short: Dont end up there
  11. 11. Warning signs?Monitor cache sizes and heap spaceOutgrowing page cache: gradual slowdownOutgrowing heap space: sudden crash
  12. 12. Understand the memory profileTest realisticlyBound cache sizes and flush thresholdsv0.90+ takes you longer with field filters, etc.
  13. 13. Large heaps are expensive to garbage collectKeep heap < 32GiB (But test!)Lots of page cache is good, though!
  14. 14. SecurityElasticsearch trusts everyoneNot its job to do auth(z)Youre the gatekeeper
  15. 15. _searchRead only?Limit indexes / wrap with filters?Protect the field caches
  16. 16. Arbitrary code executionElasticsearch has powerful scriptingNot sandboxedOn by default
  17. 17. Any website can reach your machinehttp://…Run in a virtual machine
  18. 18. NetworkingElasticsearch is distributedEasy (for a distributed system)Supports many usage patterns.
  19. 19. Quite common topologyHigh availability, right?
  20. 20. Obey or risk split brains …… and irrecoverable data-loss
  21. 21. +1 is a "tie breaker"
  22. 22. Stormy cloudsZone vs instance failureThundering herdsOptimizing MTTR is not HA
  23. 23. Client considerationsIdempotent/retry-able requests  Use a connection pool._bulk / _msearch
  24. 24. Have enough memoryHave a majority of nodesDont allow arbitrary search requestsUse retryable requests
  25. 25. Alex over Trondheim, Tore HelgedagsrudElephant, Roy CostelloWingsuit, Richard SchneiderLightning Storm and Stars, Justin EnnisWingsuit flock, Richard SchneiderOh salad, you so funny, Eatliver
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.