Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Taking Elasticsearch From 0 to 88mph

559 views

Published on

Everyone wants their Elasticsearch cluster to index and search faster, but optimizing both and finding the balance between the two can be tricky. At Kenna Security, we use Elasticsearch to store over 3 billion vulnerabilities for our clients. All that data needs to be quickly accessible so clients can assess their cyber security risk. At the same time the data is constantly changing. On average, we update 200+ million documents a day which means indexing speed is also a top priority.

In the early days our cluster could barely keep up. Nodes would fall over constantly, indexing queues would get backed up for days, and searches timed out about 50% of the time. Fixing all of these issues did not happen overnight. However, with a lot of testing, tweaking, and a few “OH crap!” moments we were able to build a stable, 21 node cluster that now meets all of our indexing and searching demands. In this talk I will share the insights we gained and the strategies we used to scale our cluster and hopefully that advice will save others some time and frustration as they grow their own.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Taking Elasticsearch From 0 to 88mph

  1. 1. Taking Elasticsearch from 0 to 88 mph By: Molly Struve
  2. 2. 500 million documents 1 million processed daily
  3. 3. Over 3 billion documents 200 million processed daily
  4. 4. The average company has... 60 thousand assets 24 million vulnerabilities?
  5. 5. MySQL Elasticsearch
  6. 6. Refresh Interval
  7. 7. In-memory buffer
  8. 8. In-memory buffer
  9. 9. Toggle the Refresh Interval curl -XPUT 'localhost:9200/my_index/_settings' -d '{ "index" : { "refresh_interval" : "30s" } }'
  10. 10. In addition... ● ?refresh=wait_for option ● Manual refresh ○ curl -XPOST 'localhost:9200/my_index/_refresh'
  11. 11. Speed up Indexing ● Toggle the refresh interval1 2 3 Speed Up Searching 4 5 6 7
  12. 12. ● 200 thousand assets ● 100 million vulnerabilities
  13. 13. Bulk Processing
  14. 14. POST _bulk
  15. 15. When bulk processing your data... ● Start with batches of 100 and double the size from there until indexing time plateaus ● Too large requests can put memory pressure on Elasticsearch so keep it under a couple tens of megabytes
  16. 16. Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 Speed Up Searching 4 5 6 7
  17. 17. MySQL Elasticsearch Elasticsearch MySQL
  18. 18. 429 Too Many Requests
  19. 19. Route Your Documents
  20. 20. 8 Threads 4 Threads Shard 2 Shard 3 Shard 4 4 Threads 4 Threads Shard 1 Shard 2 Shard 3 Shard 4 Shard 1 Shard 1 Shard 2 Shard 2 2 Threads 2 Threads Shard 1 Shard 3 Shard 4 Shard 4 Shard 3
  21. 21. Routing shard = hash(_routing) % number_of_primary_shards PUT my_index/_doc/1?routing=custom { "title": "This is a document" } document _id custom
  22. 22. 4 Threads 2 Threads 2 Threads Shard 1 Shard 2 Shard 2 Shard 1 Shard 3 Shard 4 Shard 4 Shard 3 Route 1 Route 2 Route 2 Route 1 Route 3 Route 4 Route 4 Route 3
  23. 23. Parent -> Child Asset -> Vulnerabilities PUT my_index/_vulnerability/1?parent=2 { "title": "This is a document" }
  24. 24. Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 Speed Up Searching 4 5 6 7
  25. 25. Speed up Searching
  26. 26. Typical Logging Cluster logstash_2018.09.01 logstash_2018.09.02 logstash_2018.09.03 logstash_2018.09.04 Search
  27. 27. Group Your Data
  28. 28. Client 1 Client 2 Client 3 Client 4
  29. 29. Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  30. 30. Does NOT score documents and only cares if the document matches the search criteria or not Queries Scores documents based on how well they match the search criteria. Filters Easy/Fast Hard/Slow Easy/Fast Hard/Slow
  31. 31. 2.x 5.x
  32. 32. Use Filters Whenever Possible
  33. 33. Filters Are Friends! Filters
  34. 34. GET /_search { "query": { "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "filter": { "term" : { "tag" : "tech" } } } } }
  35. 35. Use filters whenever possible Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  36. 36. Store IDs as keywords
  37. 37. Why? ● Numeric mapping types are optimized for RANGE queries ● Keyword mapping types are optimized for TERM queries
  38. 38. 30% increase in search speed
  39. 39. Store IDs as keywords Use filters whenever possible Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  40. 40. Don’t let your users slow you down
  41. 41. Define keywords for searching
  42. 42. Don’t let your users slow you down Store IDs as keywords Use filters whenever possible Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  43. 43. Questions?
  44. 44. Contact https://www.linkedin.com/in/mollystruve/ https://github.com/mstruve @molly_struve molly.struve@gmail.com

×