Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Taking Elasticsearch from
0 to 88 mph
By: Molly Struve
500 million documents
1 million processed daily
Over 3 billion documents
200 million processed daily
The average company has...
60 thousand
assets
24 million
vulnerabilities?
MySQL Elasticsearch
Refresh Interval
In-memory buffer
In-memory buffer
Toggle the Refresh Interval
curl -XPUT 'localhost:9200/my_index/_settings' -d '{
"index" : {
"refresh_interval" : "30s"
}
...
In addition...
● ?refresh=wait_for option
● Manual refresh
○ curl -XPOST 'localhost:9200/my_index/_refresh'
Speed up Indexing
● Toggle the refresh interval1
2
3
Speed Up Searching
4
5
6
7
● 200 thousand assets
● 100 million
vulnerabilities
Bulk Processing
POST _bulk
When bulk processing your data...
● Start with batches of 100 and double the size from there until indexing time
plateaus
...
Bulk process data
Speed up Indexing
Toggle the refresh interval1
2
3
Speed Up Searching
4
5
6
7
MySQL
Elasticsearch
Elasticsearch
MySQL
429 Too Many Requests
Route Your Documents
8 Threads 4 Threads
Shard 2
Shard 3
Shard 4
4 Threads
4 Threads
Shard 1
Shard 2
Shard 3
Shard 4
Shard 1
Shard 1
Shard 2
Sh...
Routing
shard = hash(_routing) % number_of_primary_shards
PUT my_index/_doc/1?routing=custom
{
"title": "This is a documen...
4 Threads
2 Threads
2 Threads
Shard 1
Shard 2
Shard 2
Shard 1
Shard 3
Shard 4
Shard 4
Shard 3
Route 1
Route 2
Route 2
Rout...
Parent -> Child
Asset -> Vulnerabilities
PUT my_index/_vulnerability/1?parent=2
{
"title": "This is a document"
}
Route your documents
Bulk process data
Speed up Indexing
Toggle the refresh interval1
2
3
Speed Up Searching
4
5
6
7
Speed up Searching
Typical Logging Cluster
logstash_2018.09.01
logstash_2018.09.02
logstash_2018.09.03
logstash_2018.09.04
Search
Group Your Data
Client 1 Client 2 Client 3 Client 4
Speed Up Searching
Group your data
Route your documents
Bulk process data
Speed up Indexing
Toggle the refresh interval1
2...
Does NOT score documents
and only cares if the
document matches the search
criteria or not
Queries
Scores documents based ...
2.x 5.x
Use Filters Whenever
Possible
Filters Are Friends!
Filters
GET /_search
{
"query": {
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"filter": {
"term" : { "tag" : "tech" }
...
Use filters whenever possible
Speed Up Searching
Group your data
Route your documents
Bulk process data
Speed up Indexing
...
Store IDs as keywords
Why?
● Numeric mapping types are optimized for RANGE
queries
● Keyword mapping types are optimized for TERM
queries
30% increase in
search speed
Store IDs as keywords
Use filters whenever possible
Speed Up Searching
Group your data
Route your documents
Bulk process d...
Don’t let your users slow
you down
Define keywords for searching
Don’t let your users slow you down
Store IDs as keywords
Use filters whenever possible
Speed Up Searching
Group your data
...
Questions?
Contact
https://www.linkedin.com/in/mollystruve/
https://github.com/mstruve
@molly_struve
molly.struve@gmail.com
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Taking Elasticsearch From 0 to 88mph
Upcoming SlideShare
Loading in …5
×

Taking Elasticsearch From 0 to 88mph

858 views

Published on

Everyone wants their Elasticsearch cluster to index and search faster, but optimizing both and finding the balance between the two can be tricky. At Kenna Security, we use Elasticsearch to store over 3 billion vulnerabilities for our clients. All that data needs to be quickly accessible so clients can assess their cyber security risk. At the same time the data is constantly changing. On average, we update 200+ million documents a day which means indexing speed is also a top priority.

In the early days our cluster could barely keep up. Nodes would fall over constantly, indexing queues would get backed up for days, and searches timed out about 50% of the time. Fixing all of these issues did not happen overnight. However, with a lot of testing, tweaking, and a few “OH crap!” moments we were able to build a stable, 21 node cluster that now meets all of our indexing and searching demands. In this talk I will share the insights we gained and the strategies we used to scale our cluster and hopefully that advice will save others some time and frustration as they grow their own.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Taking Elasticsearch From 0 to 88mph

  1. 1. Taking Elasticsearch from 0 to 88 mph By: Molly Struve
  2. 2. 500 million documents 1 million processed daily
  3. 3. Over 3 billion documents 200 million processed daily
  4. 4. The average company has... 60 thousand assets 24 million vulnerabilities?
  5. 5. MySQL Elasticsearch
  6. 6. Refresh Interval
  7. 7. In-memory buffer
  8. 8. In-memory buffer
  9. 9. Toggle the Refresh Interval curl -XPUT 'localhost:9200/my_index/_settings' -d '{ "index" : { "refresh_interval" : "30s" } }'
  10. 10. In addition... ● ?refresh=wait_for option ● Manual refresh ○ curl -XPOST 'localhost:9200/my_index/_refresh'
  11. 11. Speed up Indexing ● Toggle the refresh interval1 2 3 Speed Up Searching 4 5 6 7
  12. 12. ● 200 thousand assets ● 100 million vulnerabilities
  13. 13. Bulk Processing
  14. 14. POST _bulk
  15. 15. When bulk processing your data... ● Start with batches of 100 and double the size from there until indexing time plateaus ● Too large requests can put memory pressure on Elasticsearch so keep it under a couple tens of megabytes
  16. 16. Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 Speed Up Searching 4 5 6 7
  17. 17. MySQL Elasticsearch Elasticsearch MySQL
  18. 18. 429 Too Many Requests
  19. 19. Route Your Documents
  20. 20. 8 Threads 4 Threads Shard 2 Shard 3 Shard 4 4 Threads 4 Threads Shard 1 Shard 2 Shard 3 Shard 4 Shard 1 Shard 1 Shard 2 Shard 2 2 Threads 2 Threads Shard 1 Shard 3 Shard 4 Shard 4 Shard 3
  21. 21. Routing shard = hash(_routing) % number_of_primary_shards PUT my_index/_doc/1?routing=custom { "title": "This is a document" } document _id custom
  22. 22. 4 Threads 2 Threads 2 Threads Shard 1 Shard 2 Shard 2 Shard 1 Shard 3 Shard 4 Shard 4 Shard 3 Route 1 Route 2 Route 2 Route 1 Route 3 Route 4 Route 4 Route 3
  23. 23. Parent -> Child Asset -> Vulnerabilities PUT my_index/_vulnerability/1?parent=2 { "title": "This is a document" }
  24. 24. Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 Speed Up Searching 4 5 6 7
  25. 25. Speed up Searching
  26. 26. Typical Logging Cluster logstash_2018.09.01 logstash_2018.09.02 logstash_2018.09.03 logstash_2018.09.04 Search
  27. 27. Group Your Data
  28. 28. Client 1 Client 2 Client 3 Client 4
  29. 29. Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  30. 30. Does NOT score documents and only cares if the document matches the search criteria or not Queries Scores documents based on how well they match the search criteria. Filters Easy/Fast Hard/Slow Easy/Fast Hard/Slow
  31. 31. 2.x 5.x
  32. 32. Use Filters Whenever Possible
  33. 33. Filters Are Friends! Filters
  34. 34. GET /_search { "query": { "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "filter": { "term" : { "tag" : "tech" } } } } }
  35. 35. Use filters whenever possible Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  36. 36. Store IDs as keywords
  37. 37. Why? ● Numeric mapping types are optimized for RANGE queries ● Keyword mapping types are optimized for TERM queries
  38. 38. 30% increase in search speed
  39. 39. Store IDs as keywords Use filters whenever possible Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  40. 40. Don’t let your users slow you down
  41. 41. Define keywords for searching
  42. 42. Don’t let your users slow you down Store IDs as keywords Use filters whenever possible Speed Up Searching Group your data Route your documents Bulk process data Speed up Indexing Toggle the refresh interval1 2 3 4 5 6 7
  43. 43. Questions?
  44. 44. Contact https://www.linkedin.com/in/mollystruve/ https://github.com/mstruve @molly_struve molly.struve@gmail.com

×