Elasticsearch in Production
!
Alex Brasetvik
alex@found.no
@alexbrasetvik
Elasticsearch in Production
!
Alex Brasetvik
alex@found.no
@alexbrasetvik
Who?
Co-founder of Found AS
8+ years search, 3+ Elasticsearch
Herding hundreds of Elasticsearch clusters
Agenda
Agenda
• Anti-patterns
• Memory / Resource Usage
• Distributed problems
• Security
• Client concerns
• Changing a cluster
found.no/foundation
Elasticsearch in Production Elasticsearch as a NoSQL Database
Intro to Function Scoring All About Anal...
Snapshot / Restore
Circuit breakers
Document values
Aggregations
Distributed percolation
Suggesters
…
Anti-Patterns
Arbitrary Keys
• “Schema Free”
• One field per value
• Ever-growing cluster state
acls:
1234: READ
42: WRITE
Heavy Updating
• Update = Delete + Reindex
• Be careful with counters
Slow queries
• WHERE foo ILIKE ‘%bar%’
• {“query_string”: {“query”: “foo:*bar*”}}
• Don’t ask for 3300 results :)
Arbitrary searches
query:
filtered:
filter:
term:
user_id: 42
query:
[user’s query here]
Memory
Memory
• Field caches
• Filter caches
• Page caches
• Aggregations
• Index building
Page Cache
• Keeping index pages in memory
• Can’t have too much
• Outgrow: Gradual slowdown
Heap Space
• Memory used by Elasticsearch process
• Field / Filter caches
• Aggregations
Time Bomb
Time Bomb
OutOfMemoryError
Woah there
I ate all the memories
Your cluster may or may not work any more
OutOfMemory
• Growing too big
• Selecting too big timespan in Kibana
• Document ingestion peak
Preventing OOMs
• Have enough memory :-)
• Understand your search’s memory profile
• Bulk / Circuit breaker settings
• Moni...
Marvel
( /_stats )
"my_field": {
"type": "string",
"fielddata": {
"format": "doc_values"
}
}
Document Values
• Rely on page cache
• Only caches doc values actually used
Sizing
Sizing
• Test, don’t guess
• Start big, scale down
• Index, search, monitor
Glitch Meltdown
Glitch Meltdown
• Tie-breaker can be a cheap master-node
• Applies to data centers / availability zones too
Data-only
nodes
Master-only
nodes
Jepsen
Jepsen
• Kyle Kingsbury’s series on distributed systems
• Distributed systems are hard
• aphyr.com
Security
Security
• “Not my job!” – Elasticsearch
• That’s fine!
Dynamic Scripts
!
• Scoring
• Aggregations
• Updating
Dynamic Scripts
Runtime.getRuntime().exec(…)
Dynamic Scripts
Runtime.getRuntime().exec(…)
<script src=“http://127.0.0.1:9200/_search?callback=capture&…
Security
!
• Disable dynamic scripts (On by default in ≤1.1)
• Mind index patterns
• Even then, don’t accept arbitrary req...
Client Concerns
Client Concerns
• Connection pools
• Idempotent requests
• Have sane syncing/indexing strategies
# BOOM !
Cluster changes
Cluster changes
• Make new nodes join existing cluster
• No rolling restarts
• Easy rollback if things go bad
v1.0.0 v1.0.1
Cluster changes
• Test first
• Mind recover_*-settings
Multi-Cluster Workflows
• Snapshot/Restore
• Operations across clusters
• Swap clusters!
• Works well with good syncing str...
• Rolling restarts: Risky, fast
• Grow and shrink: Less risky, copies lots of data
• Multiple clusters: Least risky, copie...
Misc
• Same JVM
• ulimits
• Unicast
• Kernel-settings like IO-scheduler
?
@foundsays
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
Upcoming SlideShare
Loading in …5
×

Elasticsearch in Production (London version)

4,202 views

Published on

Elasticsearch in production, or an overview of things you want to know about before happening upon them in production.

Published in: Software, Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,202
On SlideShare
0
From Embeds
0
Number of Embeds
2,378
Actions
Shares
0
Downloads
31
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Elasticsearch in Production (London version)

  1. 1. Elasticsearch in Production ! Alex Brasetvik alex@found.no @alexbrasetvik
  2. 2. Elasticsearch in Production ! Alex Brasetvik alex@found.no @alexbrasetvik
  3. 3. Who? Co-founder of Found AS 8+ years search, 3+ Elasticsearch Herding hundreds of Elasticsearch clusters
  4. 4. Agenda
  5. 5. Agenda • Anti-patterns • Memory / Resource Usage • Distributed problems • Security • Client concerns • Changing a cluster
  6. 6. found.no/foundation Elasticsearch in Production Elasticsearch as a NoSQL Database Intro to Function Scoring All About Analyzers Securing your Elasticsearch Cluster
  7. 7. Snapshot / Restore Circuit breakers Document values Aggregations Distributed percolation Suggesters …
  8. 8. Anti-Patterns
  9. 9. Arbitrary Keys • “Schema Free” • One field per value • Ever-growing cluster state acls: 1234: READ 42: WRITE
  10. 10. Heavy Updating • Update = Delete + Reindex • Be careful with counters
  11. 11. Slow queries • WHERE foo ILIKE ‘%bar%’ • {“query_string”: {“query”: “foo:*bar*”}} • Don’t ask for 3300 results :)
  12. 12. Arbitrary searches query: filtered: filter: term: user_id: 42 query: [user’s query here]
  13. 13. Memory
  14. 14. Memory • Field caches • Filter caches • Page caches • Aggregations • Index building
  15. 15. Page Cache • Keeping index pages in memory • Can’t have too much • Outgrow: Gradual slowdown
  16. 16. Heap Space • Memory used by Elasticsearch process • Field / Filter caches • Aggregations
  17. 17. Time Bomb
  18. 18. Time Bomb
  19. 19. OutOfMemoryError Woah there I ate all the memories Your cluster may or may not work any more
  20. 20. OutOfMemory • Growing too big • Selecting too big timespan in Kibana • Document ingestion peak
  21. 21. Preventing OOMs • Have enough memory :-) • Understand your search’s memory profile • Bulk / Circuit breaker settings • Monitoring • Document values
  22. 22. Marvel ( /_stats )
  23. 23. "my_field": { "type": "string", "fielddata": { "format": "doc_values" } }
  24. 24. Document Values • Rely on page cache • Only caches doc values actually used
  25. 25. Sizing
  26. 26. Sizing • Test, don’t guess • Start big, scale down • Index, search, monitor
  27. 27. Glitch Meltdown
  28. 28. Glitch Meltdown
  29. 29. • Tie-breaker can be a cheap master-node • Applies to data centers / availability zones too
  30. 30. Data-only nodes Master-only nodes
  31. 31. Jepsen
  32. 32. Jepsen • Kyle Kingsbury’s series on distributed systems • Distributed systems are hard • aphyr.com
  33. 33. Security
  34. 34. Security • “Not my job!” – Elasticsearch • That’s fine!
  35. 35. Dynamic Scripts ! • Scoring • Aggregations • Updating
  36. 36. Dynamic Scripts Runtime.getRuntime().exec(…)
  37. 37. Dynamic Scripts Runtime.getRuntime().exec(…) <script src=“http://127.0.0.1:9200/_search?callback=capture&…
  38. 38. Security ! • Disable dynamic scripts (On by default in ≤1.1) • Mind index patterns • Even then, don’t accept arbitrary requests
  39. 39. Client Concerns
  40. 40. Client Concerns • Connection pools • Idempotent requests • Have sane syncing/indexing strategies
  41. 41. # BOOM !
  42. 42. Cluster changes
  43. 43. Cluster changes • Make new nodes join existing cluster • No rolling restarts • Easy rollback if things go bad
  44. 44. v1.0.0 v1.0.1
  45. 45. Cluster changes • Test first • Mind recover_*-settings
  46. 46. Multi-Cluster Workflows • Snapshot/Restore • Operations across clusters • Swap clusters! • Works well with good syncing strategy
  47. 47. • Rolling restarts: Risky, fast • Grow and shrink: Less risky, copies lots of data • Multiple clusters: Least risky, copies lots of data
  48. 48. Misc • Same JVM • ulimits • Unicast • Kernel-settings like IO-scheduler
  49. 49. ? @foundsays

×