Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RabbitMQ Operations

4,396 views

Published on

In this talk, we will cover best practices in running RabbitMQ, things you should avoid doing, lesser known features, recent operations improvements and a bit of what's ahead.

Given in October, 2015.

Published in: Technology

RabbitMQ Operations

  1. 1. RabbitMQ Operations
  2. 2. About me
  3. 3. About me • RabbitMQ staff engineer at Pivotal
  4. 4. About me • RabbitMQ staff engineer at Pivotal • @michaelklishin just about everywhere
  5. 5. About this talk
  6. 6. About this talk • Brain dump from years of answering questions
  7. 7. About this talk • Brain dump from years of answering questions • Focusses on the most recent release (3.5.6)
  8. 8. Provisioning
  9. 9. Provisioning • Be aware of mirrors: GitHub, Bintray, …
  10. 10. Provisioning • Be aware of mirrors: GitHub, Bintray, … • Looking into community-hosted mirrors
  11. 11. Provisioning • Be aware of mirrors: GitHub, Bintray, … • Looking into community-hosted mirrors • Use packages + Chef/Puppet/…
  12. 12. OS resources
  13. 13. OS resources • Modern Linux defaults are absolutely inadequate for servers
  14. 14. ulimit -n default: 1024
  15. 15. Set ulimit -n and fs.file-max to 500K and forget about it
  16. 16. TCP keepalive timeout: from 11 minutes to over 2 hours by default
  17. 17. net.ipv4.tcp_keepalive_time = 6 net.ipv4.tcp_keepalive_intvl = 3 net.ipv4.tcp_keepalive_probes = 3
  18. 18. enable client heartbeats, e.g. with an interval of 6-12 seconds
  19. 19. OS resources • Modern Linux defaults are absolutely inadequate for servers • Tuning for throughput vs. high number of concurrent connections
  20. 20. Throughput: larger TCP buffers
  21. 21. net.core.rmem_max = 16777216 net.core.wmem_max = 16777216
  22. 22. rabbit.hipe_compile = true (only on Erlang 17.x or 18.x)
  23. 23. Concurrent connections: smaller TCP buffers, low tcp_fin_timeout, tcp_tw_reuse = 1, …
  24. 24. rabbit.tcp_listen_options.sndbuf rabbit.tcp_listen_options.recbuf rabbit.tcp_listen_options.backlog
  25. 25. Reduce per connection RAM use by 10x rabbit.tcp_listen_options.sndbuf = 16384 rabbit.tcp_listen_options.recbuf = 16384
  26. 26. Reduce per connection RAM use by 10x Throughput drops by a comparable amount
  27. 27. net.ipv4.tcp_fin_timeout = 5
  28. 28. net.ipv4.tcp_tw_reuse = 1
  29. 29. Careful with tcp_tw_reuse behind NAT* * http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html
  30. 30. net.core.somaxconn = 4096
  31. 31. http://www.rabbitmq.com/networking.html
  32. 32. Disk space
  33. 33. Disk space • Pay attention to what partition /var/lib ends up on
  34. 34. Disk space • Pay attention to what partition /var/lib ends up on • Transient messages can be paged to disk
  35. 35. Disk space • Pay attention to what partition /var/lib ends up on • Transient messages can be paged to disk • RabbitMQ’s disk monitor isn’t supported on all platforms
  36. 36. RAM usage
  37. 37. RAM usage • rabbit.vm_memory_high_watermark
  38. 38. RAM usage • rabbit.vm_memory_high_watermark • rabbit.vm_memory_high_watermark_paging_ratio
  39. 39. rabbitmqctl status rabbitmqctl report
  40. 40. RAM usage • rabbit.vm_memory_high_watermark • rabbit.vm_memory_high_watermark_paging_ratio • Significant paging efficiency improvements in 3.5.5-3.5.6
  41. 41. RAM usage • rabbit.vm_memory_high_watermark • rabbit.vm_memory_high_watermark_paging_ratio • Significant paging efficiency improvements in 3.5.5-3.5.6 • Disable rabbit.fhc_read_buffering (3.5.6+)
  42. 42. rabbitmqctl eval ‘file_handle_cache:clear_read_cache().’
  43. 43. recon
  44. 44. Ability to set VM RAM watermark as absolute value is coming in 3.6
  45. 45. Stats collector falls behind
  46. 46. Stats collector falls behind • Management DB stats collector can get overwhelmed
  47. 47. Stats collector falls behind • Management DB stats collector can get overwhelmed • Key symptom: disproportionally higher RAM use on the node that hosts management DB
  48. 48. rabbitmqctl eval 'P = whereis(rabbit_mgmt_db), erlang:process_info(P).'
  49. 49. [{registered_name,rabbit_mgmt_db}, {current_function,{erlang,hibernate,3}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,0}, {messages,[]}, {links,[<5477.358.0>]}, {dictionary,[{'$ancestors',[<5477.358.0>,rabbit_mgmt_sup,rabbit_mgmt_sup_sup, <5477.338.0>]}, {'$initial_call',{gen,init_it,7}}]}, {trap_exit,false}, {error_handler,error_handler}, {priority,high}, {group_leader,<5477.337.0>}, {total_heap_size,167}, {heap_size,167}, {stack_size,0}, {reductions,318}, {garbage_collection,[{min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,65535}, {minor_gcs,0}]}, {suspending,[]}]
  50. 50. rabbit.collect_statistics_interval = 30000
  51. 51. rabbitmq_management.rates_mode = none
  52. 52. rabbitmqctl eval 'P = whereis(rabbit_mgmt_db), erlang:exit(P, please_crash).'
  53. 53. Parallel stats collector is coming in 3.7
  54. 54. Cluster formation
  55. 55. Cluster formation • Node restart order dependency
  56. 56. Cluster formation • Node restart order dependency • github.com/rabbitmq/rabbitmq-clusterer
  57. 57. Cluster formation • Node restart order dependency • github.com/rabbitmq/rabbitmq-clusterer • github.com/aweber/rabbitmq-autocluster
  58. 58. Backups
  59. 59. How do I back up? • cp $RABBITMQ_MNESIA_DIR + tar
  60. 60. How do I back up? • cp $RABBITMQ_MNESIA_DIR + tar • Replicate everything off-site with exchange federation + set message TTL via a policy
  61. 61. Hostname changes
  62. 62. rabbitmqctl rename_cluster_node [old name] [new name]
  63. 63. Network partition handling
  64. 64. Network partition handling • When in doubt, use “autoheal”
  65. 65. Network partition handling • When in doubt, use “autoheal” • “Merge” is coming but has very real downsides, too
  66. 66. Misc
  67. 67. Misc • Don’t use default vhost and/or credentials
  68. 68. Misc • Don’t use default vhost and/or credentials • Don’t use 32-bit Erlang
  69. 69. Misc • Don’t use default vhost and/or credentials • Don’t use 32-bit Erlang • Use reasonably up-to-date releases
  70. 70. Misc • Don’t use default vhost and/or credentials • Don’t use 32-bit Erlang • Use reasonably up-to-date releases • Participate in rabbitmq-users
  71. 71. Misc • OCF resource template from Fuel (by Mirantis)
  72. 72. Misc • OCF resource template from Fuel (by Mirantis) • Use TLS
  73. 73. Coming in 3.6
  74. 74. Coming in 3.6 • In process file buffering disabled by default
  75. 75. Coming in 3.6 • In process file buffering disabled by default • Queue master to node distribution strategies
  76. 76. Coming in 3.6 • In process file buffering disabled by default • Queue master to node distribution strategies • SHA-256 (or 512) for password hashing
  77. 77. Coming in 3.6 • In process file buffering disabled by default • Queue master to node distribution strategies • SHA-256 (or 512) for password hashing • More responsive management UI with pagination
  78. 78. Coming in 3.6 • In process file buffering disabled by default • Queue master to node distribution strategies • SHA-256 (or 512) for password hashing • More responsive management UI with pagination • Streaming rabbitmqctl
  79. 79. Coming past 3.6
  80. 80. Coming past 3.6 • Pluggable cluster formation (à la ElasticSearch)
  81. 81. Coming past 3.6 • Pluggable cluster formation (à la ElasticSearch) • On disk data recovery tools
  82. 82. Coming past 3.6 • Pluggable cluster formation (à la ElasticSearch) • On disk data recovery tools • Better CLI tools
  83. 83. Coming past 3.6 • Pluggable cluster formation (à la ElasticSearch) • On disk data recovery tools • Better CLI tools • Easier off-site replication
  84. 84. Coming past 3.6 • Pluggable cluster formation (à la ElasticSearch) • On disk data recovery tools • Better CLI tools • Easier off-site replication • “Merge” partition handling strategy (no earlier than 3.8)
  85. 85. Thank you
  86. 86. Thank you • @michaelklishin • github.com/michaelklishin • rabbitmq-users • Our team is hiring!

×