Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Troubleshooting RabbitMQ and services that use it

1,386 views

Published on

Designing a system in terms of [micro] services is hype du jour but it's not without trade-offs. Debugging a distributed system can be challenging. In this talk we will cover how one can start troubleshooting a distributed service-oriented system.

Published in: Technology
  • Be the first to comment

Troubleshooting RabbitMQ and services that use it

  1. 1. Troubleshooting RabbitMQ and services that use it
  2. 2. Who am I? • Staff Engineer, RabbitMQ @ Pivotal
  3. 3. Who am I? • Staff Engineer, RabbitMQ @ Pivotal • @michaelklishin, github.com/michaelklishin
  4. 4. The monolith problem
  5. 5. Troubleshooting publishers
  6. 6. Troubleshooting publishers • I/O exceptions (shutdown handlers)
  7. 7. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms
  8. 8. When in doubt, borrow ideas from TCP
  9. 9. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms • Returned message handlers
  10. 10. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms • Returned message handlers • Invalid payload (e.g. fails to deserialize or decrypt)
  11. 11. Troubleshooting publishers • I/O exceptions (shutdown handlers) • Publisher confirms • Returned message handlers • Invalid payload (e.g. fails to deserialize or decrypt) • Identifying publisher instances
  12. 12. Troubleshooting publishers • identifying blocked (throttled) publishers
  13. 13. Client-provided connection names in RabbitMQ 3.6.3+
  14. 14. Troubleshooting publishers • identifying blocked (throttled) publishers • retries
  15. 15. Troubleshooting publishers • spring-amqp can cover all of the above
  16. 16. Troubleshooting consumers
  17. 17. Troubleshooting consumers • I/O exceptions
  18. 18. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS
  19. 19. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming
  20. 20. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming
  21. 21. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming • Redelivery metrics
  22. 22. Troubleshooting consumers • I/O exceptions • Inadequate delivery QoS • Lack of confirmations; double-confirming • Redelivery metrics • Identifying consumer instances
  23. 23. Troubleshooting consumers • Consumer utilization (reported by HTTP API)
  24. 24. Troubleshooting consumers • spring-amqp can help with some of the above
  25. 25. — W. Edwards Deming “In God we trust, all others must bring data…”
  26. 26. — W. Edwards Deming “In God we trust, all others must bring data…”
  27. 27. — What do you do for a living?
  28. 28. — What do you do for a living? — Tell people to read the logs.
  29. 29. Sources of data useful for debugging
  30. 30. Sources of data useful for debugging • Metrics
  31. 31. Sources of data useful for debugging • Metrics • Your logs
  32. 32. Sources of data useful for debugging • Metrics • Your logs • Someone else's logs
  33. 33. Sources of data useful for debugging • Metrics • Your logs • Someone else's logs • Tracing data
  34. 34. Sources of data useful for debugging • Metrics • Your logs • Someone else's logs • Tracing data • Wireshark (tcpdump, libpcap)
  35. 35. Collecting data from RabbitMQ
  36. 36. Collecting data from RabbitMQ • Logs
  37. 37. Collecting data from RabbitMQ • Logs • rabbitmqctl status
  38. 38. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment
  39. 39. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment • rabbitmq-top (ships with RabbitMQ as of 3.6.3)
  40. 40. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment • rabbitmq-top (ships with RabbitMQ as of 3.6.3) • HTTP API (lots of metrics)
  41. 41. http://{hostname}:15672/api
  42. 42. curl -u guest:guest http://127.0.0.1:15672/api/overview | python -m json.tool curl -u guest:guest http://127.0.0.1:15672/api/nodes/{node} | python -m json.tool curl -u guest:guest http://127.0.0.1:15672/api/queues | python -m json.tool
  43. 43. Collecting data from RabbitMQ • Logs • rabbitmqctl status • rabbitmqctl environment • rabbitmq-top (ships with RabbitMQ as of 3.6.3) • HTTP API (lots of metrics) • Message tracing ("firehose")
  44. 44. Collecting data from RabbitMQ • HTTP API (lots of metrics) • Message tracing ("firehose") • Infrastructure metrics
  45. 45. Common theme?
  46. 46. Common theme? • Collect logs system-wide
  47. 47. Common theme? • Collect logs system-wide • Collect metrics system-wide
  48. 48. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide
  49. 49. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide • Trace requests (e.g. with Zipkin)
  50. 50. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide • Trace requests (e.g. with Zipkin) • Analyze
  51. 51. Common theme? • Collect logs system-wide • Collect metrics system-wide • Collect exceptions system-wide • Trace requests (e.g. with Zipkin) • Analyze • Sounds like something a structured platform can help with!
  52. 52. Distributed system debugging is a problem far from being solved.
  53. 53. Thank you
  54. 54. Thank you • @michaelklishin
  55. 55. Thank you • @michaelklishin • github.com/michaelklishin
  56. 56. Thank you • @michaelklishin • github.com/michaelklishin • mklishin@pivotal.io

×