Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Elasticsearch in Production (London version)

Elasticsearch in production, or an overview of things you want to know about before happening upon them in production.

Elasticsearch in Production (London version)

  1. 1. Elasticsearch in Production ! Alex Brasetvik alex@found.no @alexbrasetvik
  2. 2. Elasticsearch in Production ! Alex Brasetvik alex@found.no @alexbrasetvik
  3. 3. Who? Co-founder of Found AS 8+ years search, 3+ Elasticsearch Herding hundreds of Elasticsearch clusters
  4. 4. Agenda
  5. 5. Agenda • Anti-patterns • Memory / Resource Usage • Distributed problems • Security • Client concerns • Changing a cluster
  6. 6. found.no/foundation Elasticsearch in Production Elasticsearch as a NoSQL Database Intro to Function Scoring All About Analyzers Securing your Elasticsearch Cluster
  7. 7. Snapshot / Restore Circuit breakers Document values Aggregations Distributed percolation Suggesters …
  8. 8. Anti-Patterns
  9. 9. Arbitrary Keys • “Schema Free” • One field per value • Ever-growing cluster state acls: 1234: READ 42: WRITE
  10. 10. Heavy Updating • Update = Delete + Reindex • Be careful with counters
  11. 11. Slow queries • WHERE foo ILIKE ‘%bar%’ • {“query_string”: {“query”: “foo:*bar*”}} • Don’t ask for 3300 results :)
  12. 12. Arbitrary searches query: filtered: filter: term: user_id: 42 query: [user’s query here]
  13. 13. Memory
  14. 14. Memory • Field caches • Filter caches • Page caches • Aggregations • Index building
  15. 15. Page Cache • Keeping index pages in memory • Can’t have too much • Outgrow: Gradual slowdown
  16. 16. Heap Space • Memory used by Elasticsearch process • Field / Filter caches • Aggregations
  17. 17. Time Bomb
  18. 18. Time Bomb
  19. 19. OutOfMemoryError Woah there I ate all the memories Your cluster may or may not work any more
  20. 20. OutOfMemory • Growing too big • Selecting too big timespan in Kibana • Document ingestion peak
  21. 21. Preventing OOMs • Have enough memory :-) • Understand your search’s memory profile • Bulk / Circuit breaker settings • Monitoring • Document values
  22. 22. Marvel ( /_stats )
  23. 23. "my_field": { "type": "string", "fielddata": { "format": "doc_values" } }
  24. 24. Document Values • Rely on page cache • Only caches doc values actually used
  25. 25. Sizing
  26. 26. Sizing • Test, don’t guess • Start big, scale down • Index, search, monitor
  27. 27. Glitch Meltdown
  28. 28. Glitch Meltdown
  29. 29. • Tie-breaker can be a cheap master-node • Applies to data centers / availability zones too
  30. 30. Data-only nodes Master-only nodes
  31. 31. Jepsen
  32. 32. Jepsen • Kyle Kingsbury’s series on distributed systems • Distributed systems are hard • aphyr.com
  33. 33. Security
  34. 34. Security • “Not my job!” – Elasticsearch • That’s fine!
  35. 35. Dynamic Scripts ! • Scoring • Aggregations • Updating
  36. 36. Dynamic Scripts Runtime.getRuntime().exec(…)
  37. 37. Dynamic Scripts Runtime.getRuntime().exec(…) <script src=“http://127.0.0.1:9200/_search?callback=capture&…
  38. 38. Security ! • Disable dynamic scripts (On by default in ≤1.1) • Mind index patterns • Even then, don’t accept arbitrary requests
  39. 39. Client Concerns
  40. 40. Client Concerns • Connection pools • Idempotent requests • Have sane syncing/indexing strategies
  41. 41. # BOOM !
  42. 42. Cluster changes
  43. 43. Cluster changes • Make new nodes join existing cluster • No rolling restarts • Easy rollback if things go bad
  44. 44. v1.0.0 v1.0.1
  45. 45. Cluster changes • Test first • Mind recover_*-settings
  46. 46. Multi-Cluster Workflows • Snapshot/Restore • Operations across clusters • Swap clusters! • Works well with good syncing strategy
  47. 47. • Rolling restarts: Risky, fast • Grow and shrink: Less risky, copies lots of data • Multiple clusters: Least risky, copies lots of data
  48. 48. Misc • Same JVM • ulimits • Unicast • Kernel-settings like IO-scheduler
  49. 49. ? @foundsays

×