Elasticsearch in Production

1,125 views

Published on

Published in: Engineering, Technology, Business
1 Comment
1 Like
Statistics
Notes
  • To learn more, see https://www.found.no/foundation/elasticsearch-in-production/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,125
On SlideShare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
15
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Elasticsearch in Production

  1. 1. Elasticsearch in production ! Alex Brasetvik alex@found.no @alexbrasetvik
  2. 2. Elasticsearch in production ! Alex Brasetvik alex@found.no @alexbrasetvik
  3. 3. Who? Co-founder of Found AS 8+ years search, 3+ Elasticsearch Herding hundreds of Elasticsearch clusters
  4. 4. Agenda
  5. 5. Agenda • Anti-patterns • Memory / Resource Usage • Distributed problems • Security • Client concerns • Changing a cluster
  6. 6. found.no/foundation Elasticsearch in Production Elasticsearch as a NoSQL Database Intro to Function Scoring All About Analyzers Securing your Elasticsearch Cluster
  7. 7. Snapshot / Restore Circuit breakers Document values Aggregations Distributed percolation Suggesters …
  8. 8. Anti-Patterns
  9. 9. Arbitrary Keys • “Schema Free” • One field per value • Ever-growing cluster state acls: 1234: READ 42: WRITE
  10. 10. Heavy Updating • Update = Delete + Reindex • Be careful with counters
  11. 11. Slow queries • WHERE foo ILIKE ‘%bar%’ • {“query_string”: {“query”: “foo:*bar*”}}
  12. 12. Arbitrary searches query: filtered: filter: term: user_id: 42 query: [user’s query here]
  13. 13. Time Bomb
  14. 14. Memory
  15. 15. Memory • Field caches • Filter caches • Page caches • Aggregations • Index building
  16. 16. Page Cache • Keeping index pages in memory • Can’t have too much • Outgrow: Gradual slowdown
  17. 17. Heap Space • Memory used by Elasticsearch process • Field / Filter caches • Aggregations
  18. 18. Time Bomb
  19. 19. Time Bomb
  20. 20. OutOfMemoryError Woah there I ate all the memories Your cluster may or may not work any more
  21. 21. OutOfMemory • Growing too big • Selecting too big timespan in Kibana • Document ingestion peak
  22. 22. Preventing OOMs • Have enough memory :-) • Understand your search’s memory profile • Bulk / Circuit breaker settings • Monitoring • Document values
  23. 23. Marvel ( /_stats )
  24. 24. Document Values
  25. 25. "my_field": { "type": "string", "fielddata": { "format": "doc_values" } }
  26. 26. Sizing
  27. 27. Sizing • Test, don’t guess • Start big, scale down • Index, search, monitor
  28. 28. Glitch Meltdown
  29. 29. Glitch Meltdown
  30. 30. • Tie-breaker can be a cheap master-node • Applies to data centers / availability zones too
  31. 31. Data-only nodes Master-only nodes
  32. 32. Jepsen
  33. 33. Jepsen • Kyle Kingsbury’s series on distributed systems • Distributed systems are hard • aphyr.com
  34. 34. Security
  35. 35. Security • “Not my job!” – Elasticsearch • That’s fine!
  36. 36. Dynamic Scripts ! • Scoring • Aggregations • Updating
  37. 37. Dynamic Scripts Runtime.getRuntime().exec(…)
  38. 38. Dynamic Scripts Runtime.getRuntime().exec(…) <script src=“http://127.0.0.1:9200/_search?callback=capture&…
  39. 39. Security ! • Disable dynamic scripts • Mind index patterns • Even then, don’t accept arbitrary requests
  40. 40. Client Concerns
  41. 41. Client Concerns • Connection pools • Idempotent requests • Have sane syncing/indexing strategies
  42. 42. # BOOM !
  43. 43. Cluster changes
  44. 44. Cluster changes • Make new nodes join existing cluster • No rolling restarts • Easy rollback if things go bad
  45. 45. v1.0.0 v1.0.1
  46. 46. Cluster changes • Test first • Mind recover_*-settings
  47. 47. Multi-Cluster Workflows • Snapshot/Restore • Operations across clusters • Swap clusters! • Works well with good syncing strategy
  48. 48. Misc • Same JVM • ulimits • Unicast • SSD? noop-scheduler
  49. 49. ? @foundsays

×