Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

JavaFest. Philipp Krenn. Scale Elasticsearch for Your Java Applications

Elasticsearch is a highly scalable system if you use it to its full potential. In this talk we discuss the most important parts to make it scale with your Java applications:
Monitoring your setup for performance. Optimize with bulk requests.
Distribute the load in your cluster. Pick the right index and shard strategy.

  • Be the first to comment

  • Be the first to like this

JavaFest. Philipp Krenn. Scale Elasticsearch for Your Java Applications

  1. 1. Scale for Your Java Applications Philipp Krenn @xeraa @xeraa
  2. 2. Developer @xeraa
  3. 3. @xeraa
  4. 4. Terminology Cluster, Node, Index, Shard @xeraa
  5. 5. Disclaimer This is not a benchmark @xeraa
  6. 6. Injector @xeraa
  7. 7. Download http://download.elastic.co/workshops/basic-kibana/injector/ injector-7.0.jar @xeraa
  8. 8. Use $ java -jar injector-7.0.jar --nb 1000 --bulk 100 --debug --elasticsearch --es.host http://localhost:9200 --es.user elastic --es.pass changeme --es.index person --console --cs.pretty @xeraa
  9. 9. { "name": "Mamadou Colombe", "dateofbirth": "2009-12-26", "gender": "male", "children": 4, "marketing": { "cars": null, "shoes": null, "toys": 1688, "fashion": null, "music": 243, "garden": null, "electronic": 1222, "hifi": null, "food": null }, "address": { "country": "France", "zipcode": "75000", "city": "Paris", "countrycode": "FR", "location": { "lon": 2.3250817954678284, "lat": 48.857148516731385 } } } @xeraa
  10. 10. Monitoring @xeraa
  11. 11. time $ time java -jar injector-7.0.jar ... @xeraa
  12. 12. Stack Monitoring @xeraa
  13. 13. Bulk Requests @xeraa
  14. 14. 1,000 Docs & Bulk Size 1 Executed in 37.87 secs fish external usr time 13.08 secs 135.00 micros 13.08 secs sys time 1.00 secs 752.00 micros 1.00 secs @xeraa
  15. 15. 1,000 Docs & Bulk Size 100 Executed in 4.01 secs fish external usr time 7.49 secs 134.00 micros 7.49 secs sys time 0.43 secs 738.00 micros 0.43 secs @xeraa
  16. 16. 1,000 Docs & Bulk Size 100 Executed in 3.91 secs fish external usr time 6.34 secs 138.00 micros 6.34 secs sys time 0.38 secs 843.00 micros 0.38 secs @xeraa
  17. 17. Optimal Bulk Size? @xeraa
  18. 18. It Depends Document size (100B, 1KB, 100KB, 1MB,...) Number of nodes & shards Node size Load on the cluster @xeraa
  19. 19. Refresh @xeraa
  20. 20. index.refresh_interval Default 1s 30s or -1 @xeraa
  21. 21. 10,000 Docs & Refresh 1s Executed in 10.23 secs fish external usr time 11.50 secs 124.00 micros 11.50 secs sys time 0.77 secs 845.00 micros 0.77 secs @xeraa
  22. 22. 10,000 Docs & Refresh 30s Executed in 10.02 secs fish external usr time 11.87 secs 120.00 micros 11.87 secs sys time 0.74 secs 730.00 micros 0.74 secs @xeraa
  23. 23. Load Distribution @xeraa
  24. 24. Nodes Master Data Coordinating @xeraa
  25. 25. Topology Sniffing Cloud Proxy @xeraa
  26. 26. Distribution Round Robin Adaptive Replica Selection for searches @xeraa
  27. 27. Index & Shard Strategy @xeraa
  28. 28. Defaults 7.0: 1 primary, 1 replica Before: 5 primary, 1 replica @xeraa
  29. 29. Index Patterns Static vs Time-Based @xeraa
  30. 30. Time-Based Don't _delete_by_query Do daily / weekly / ... patterns Or even better Rollover @xeraa
  31. 31. Shard Size >10GB, maybe ~50GB It depends @xeraa
  32. 32. Shards per GB Heap <20 @xeraa
  33. 33. Pagination @xeraa
  34. 34. Pagination GET person/_search { "from" : 0, "size" : 10, "sort": [ { "dateofbirth": { "order": "asc" } } ] } @xeraa
  35. 35. Deep Pagination GET person/_search { "from" : 5000, "size" : 10, "sort": [ { "dateofbirth": { "order": "asc" } } ] } @xeraa
  36. 36. Error GET person/_search { "from" : 10000, "size" : 10, "sort": [ { "dateofbirth": { "order": "asc" } } ] } @xeraa
  37. 37. For Your Protection index.max_result_window: 10000 Search After Scroll @xeraa
  38. 38. Profiler @xeraa
  39. 39. Benchmarks @xeraa
  40. 40. Rally Macrobenchmarking Framework for Elasticsearch https://github.com/elastic/rally @xeraa
  41. 41. Nightly Benchmarks https://benchmarks.elastic.co @xeraa
  42. 42. Conclusion @xeraa
  43. 43. Topics Monitoring Bulk & Refresh Load Distribution @xeraa
  44. 44. Topics Index & Shard Strategy Pagination Benchmarks @xeraa
  45. 45. It Depends @xeraa
  46. 46. Free & Open @xeraa
  47. 47. Scale for Your Java Applications Philipp Krenn @xeraa @xeraa

×