Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cache is King: Get the Most Bang for Your Buck From Ruby

868 views

Published on

Sometimes your fastest queries can cause the most problems. I will take you beyond the slow query optimization and instead zero in on the performance impacts surrounding the quantity of your datastore hits. Using real world examples dealing with datastores such as Elasticsearch, MySQL, and Redis, I will demonstrate how many fast queries can wreak just as much havoc as a few big slow ones. With each example I will make use of the simple tools available in Ruby to decrease and eliminate the need for these fast and seemingly innocuous datastore hits.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Cache is King: Get the Most Bang for Your Buck From Ruby

  1. 1. Cache is King Get the Most Bang for Your Buck From Ruby
  2. 2. Site Reliability Engineer
  3. 3. Adding Indexes Using SELECT statements Batch Processing
  4. 4. Elasticsearch::Transport::Errors::GatewayTimeout 504 { "statusCode": 200, "took": "100ms" }
  5. 5. Resque
  6. 6. Demo Time!
  7. 7. Quantity of Datastore Hits
  8. 8. The average company has... 60 thousand assets 24 million vulnerabilities?
  9. 9. MySQL Elasticsearch Cluster
  10. 10. Serialization
  11. 11. MySQL Elasticsearch Cluster ActiveModelSerializers
  12. 12. module Beehive module Serializers class Vulnerability < ActiveModel::Serializer attributes :id, :client_id, :created_at, :updated_at, :priority, :details, :notes, :asset_id, :solution_id, :owner_id, :ticket_id end end end
  13. 13. 200 MILLION
  14. 14. 11 hours and counting...
  15. 15. (1.6ms) (0.9ms) (4.1ms)(5.2ms) (5.2ms) (1.3ms) (3.1ms) (2.9ms) (2.2ms) (4.9ms) (6.0ms) (0.3ms) (1.6ms) (0.9ms) (2.2ms) (3.0ms) (2.1ms) (1.3ms) (2.1ms) (8.1ms) (1.4ms)
  16. 16. MySQL
  17. 17. Bulk Serialization
  18. 18. class BulkVulnerabilityCache attr_accessor :vulnerabilities, :client, :vulnerability_ids def initialize(vulns, client) self.vulnerabilities = vulns self.vulnerability_ids = vulns.map(&:id) self.client = client end # MySQL Lookups end
  19. 19. module Serializers class Vulnerability attr_accessor :vulnerability, :cache def initialize(vuln, bulk_cache) self.cache = bulk_cache self.vulnerability = vuln end end end self.cache = bulk_cache
  20. 20. class Vulnerability has_many :custom_fields end
  21. 21. CustomField.where(:vulnerability_id => vuln.id) cache.fetch('custom_fields', vuln.id)
  22. 22. The Result... (pry)> vulns = Vulnerability.limit(300); (pry)> Benchmark.realtime { vulns.each(&:serialize) } => 6.022452222998254 (pry)> Benchmark.realtime do > BulkVulnerability.new(vulns, [], client).serialize > end => 0.7267019419959979
  23. 23. Decrease in database hits Individual Serialization: Bulk Serialization: 2,100 7
  24. 24. 1k vulns 1k vulns 1k vulns Vulnerability Batches
  25. 25. 1k vulns 1k vulns 1k vulns Vulnerability Batches 7k 7
  26. 26. MySQL Queries Bulk Serialization Deployed
  27. 27. Bulk Serialization Deployed RDS CPU Utilization
  28. 28. Process in Bulk
  29. 29. Elasticsearch Cluster + Redis MySQL Vulnerabilities
  30. 30. Redis.get Client 1 Index Client 2 Index Client 3 & 4 Index
  31. 31. indexing_hashes = vulnerability_hashes.map do |hash| { :_index => Redis.get(“elasticsearch_index_#{hash[:client_id]}”) :_type => hash[:doc_type], :_id => hash[:id], :data => hash[:data] } end
  32. 32. indexing_hashes = vulnerability_hashes.map do |hash| { :_index => Redis.get(“elasticsearch_index_#{hash[:client_id]}”) :_type => hash[:doc_type], :_id => hash[:id], :data => hash[:data] } end
  33. 33. (pry)> index_name = Redis.get(“elasticsearch_index_#{client_id}”) DEBUG -- : [Redis] command=GET args="elasticsearch_index_1234" DEBUG -- : [Redis] call_time=1.07 ms GET
  34. 34. client_indexes = Hash.new do |h, client_id| h[client_id] = Redis.get(“elasticsearch_index_#{client_id}”) end
  35. 35. indexing_hashes = vuln_hashes.map do |hash| { :_index => Redis.get(“elasticsearch_index_#{client_id}”) :_type => hash[:doc_type], :_id => hash[:id], :data => hash[:data] } end client_indexes[hash[:client_id]],
  36. 36. 1 + 1 + 1 Client 1 Client 2 Client 3 1k 1k 1k
  37. 37. 1000x
  38. 38. 65% job speed up
  39. 39. Local Cache
  40. 40. Redis
  41. 41. Process in Bulk Hash Cache
  42. 42. Sharded Databases CLIENT 1 CLIENT 2 CLIENT 3
  43. 43. Asset.with_shard(client_id).find(1)
  44. 44. { 'client_123' => 'shard_123', 'client_456' => 'shard_456', 'client_789' => 'shard_789' } Sharding Configuration
  45. 45. Sharding Configuration Size 20 bytes 1kb 13kb
  46. 46. 285 Workers
  47. 47. 7.8 MB/second
  48. 48. ActiveRecord::Base.connection
  49. 49. (pry)> ActiveRecord::Base.connection => #<Octopus::Proxy:0x000055b38c697d10 @proxy_config= #<Octopus::ProxyConfig:0x000055b38c694ae8
  50. 50. module Octopus class Proxy attr_accessor :proxy_config delegate :current_shard, :current_shard=, :current_slave_group, :current_slave_group=, :shard_names, :shards_for_group, :shards, :sharded, :config, :initialize_shards, :shard_name, to: :proxy_config, prefix: false end end
  51. 51. Know your gems
  52. 52. Process in Bulk Framework Cache Hash Cache
  53. 53. Avoid making datastore hits you don’t need
  54. 54. User.where(:id => user_ids).each do |user| # Lots of user processing end
  55. 55. FALSE
  56. 56. (pry)> User.where(:id => []) User Load (1.0ms) SELECT `users`.* FROM `users` WHERE 1=0 => []
  57. 57. return unless user_ids.any? User.where(:id => user_ids).each do |user| # Lots of user processing end
  58. 58. (pry)> Benchmark.realtime do > 10_000.times { User.where(:id => []) } > end => 0.5508159045130014 (pry)> Benchmark.realtime do > 10_000.times do > next unless ids.any? > User.where(:id => []) > end > end => 0.0006368421018123627
  59. 59. (pry)> Benchmark.realtime do > 10_000.times { User.where(:id => []) } > end => 0.5508159045130014 “Ruby is slow”Hitting the database is slow!
  60. 60. User.where(:id => user_ids).each do |user| # Lots of user processing end
  61. 61. User.where(:id => user_ids).each do |user| # Lots of user processing end users = User.where(:id => user_ids).active.short.single
  62. 62. .none
  63. 63. (pry)> User.where(:id => []).active.tall.single User Load (0.7ms) SELECT `users`.* FROM `users` WHERE 1=0 AND `users`.`active` = 1 AND `users`.`short` = 0 AND `users`.`single` = 1 => [] (pry)> User.none.active.tall.single => [] .none in action...
  64. 64. Logging pry(main)> Rails.logger.level = 0 $ redis-cli monitor > commands-redis-2018-10-01.txt pry(main)> Search.connection.transport.logger = Logger.new(STDOUT)
  65. 65. Preventing useless datastore hits
  66. 66. Report Elasticsearch MySQL Redis
  67. 67. (pry)> Report.blank_reports.count => 10805 (pry)> Report.active.count => 25842 (pry)> Report.average_asset_count => 1657 Investigating Existing Reports
  68. 68. Report Elasticsearch MySQL Redis
  69. 69. 10+ hrs
  70. 70. 3 hrs
  71. 71. Process in Bulk Framework Cache Database Guards Hash Cache
  72. 72. Resque Workers Redis
  73. 73. 45 workers 45 workers 45 workers
  74. 74. 70 workers 70 workers 70 workers
  75. 75. 48 MB 16 MB
  76. 76. Redis Requests 70 workers 100k 200k
  77. 77. ?
  78. 78. Resque Throttling
  79. 79. Redis Requests 100k 200k
  80. 80. Redis Network Traffic 48MB 16MB
  81. 81. Process in Bulk Framework Cache Database Guards Remove Datastore Hits Hash Cache
  82. 82. Every datastore hit COUNTS
  83. 83. Questions
  84. 84. Contact https://www.linkedin.com/in/mollystruve/ https://github.com/mstruve @molly_struve molly.struve@gmail.com

×