Graphite, an introduction

2,440 views

Published on

An introduction to graphite and why it's so great.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,440
On SlideShare
0
From Embeds
0
Number of Embeds
77
Actions
Shares
0
Downloads
80
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Graphite, an introduction

  1. 1. Graphite: An Introduction Scaling real-time monitoring
  2. 2. The purpose today
  3. 3. What is graphite
  4. 4. Why it’s so great
  5. 5. How to graph (It’s really easy!)
  6. 6. How we use graphite
  7. 7. First, a definition
  8. 8. Alerts+Metrics=Monitoring Graphite Cacti Munin Nagios Icinga Both Zenoss Hyperic Zabbix PNP4Nagios Alerting Metrics
  9. 9. What is graphite
  10. 10. About graphite ● Django web application consisting of 3 parts: ○ carbon (relays, caches, aggregates metrics) ○ whisper (graphite’s equivalent of RRD files) ○ Web UI (graph composer, simple dashboard)
  11. 11. Why graphite?
  12. 12. Why graphing? Discover trends and patterns What time of the day do we get the most users? When x happened, what was the effect on y? How many hits am I getting per hour? How does this compare to last week? last month? Predict future events When will we need to add more servers? Databases? Negative feedback Did the release into production fix problem x?
  13. 13. Cacti SUCKS A few reasons: Ancient user interface (no javascript/ajax), terrible workflow, cannot push metrics, no formulas, no graph introspection, cannot push metrics, cannot feed out of sequence metrics, ugly graphs, no API, expose system/os metrics on host via snmp, no graph composer, no custom graphs, predefine metrics, predefine graphs, static polling interval, unscalable, tons of work to create one graph, no 3rd party ecosystem, etc.
  14. 14. Graphite ++
  15. 15. Simple
  16. 16. Powerful
  17. 17. Functions (sum, derivatives, integrals, timeshift, mostDeviant, scale, averages, etc.)
  18. 18. API (Nagios integration, 3rd party custom dashboards)
  19. 19. Scalable
  20. 20. Easy to feed data
  21. 21. Wide ecosystem of 3rd party tools and dashboards http://graphite.readthedocs.org/en/latest/tools.html
  22. 22. Tools
  23. 23. StatsD
  24. 24. Logster
  25. 25. Skyline
  26. 26. Collectd
  27. 27. Dashboards
  28. 28. Graphite --
  29. 29. No poller
  30. 30. No all in one solution
  31. 31. No easy backups
  32. 32. It probably will become business critical
  33. 33. How to graph
  34. 34. There are tons of ways to feed graphite your data
  35. 35. Bash #!/bin/bash timestamp = `date +%s` value = 10 echo "dot.delimited.metric.name $value $timestamp" | nc -w 1 graphite. host.name 2003 Python def send_msg(message, HOST, PORT): sock = socket.create_connection((HOST, PORT)) sock.send(message) sock.close() Python using graphite-pymetrics from metrics import timing @timing("heavy.task") def heavy_task( x, y, z): # do heavy stuff here
  36. 36. Ruby require 'socket' Host = 'somegraphitehost' conn = TCPSocket.new Host, 2003 conn.puts 'Metrics value timestamp' conn.close Java import java.io.DataOutputStream; import java.net.Socket; Socket conn = new Socket("somegraphitehost" , 2003); DataOutputStream dos = new DataOutputStream(conn .getOutputStream()); dos.writeBytes("metrics value timestamp" ); conn.close();
  37. 37. How we use graphite
  38. 38. 700K + metrics per minute
  39. 39. A Common Graphite Stack Graphite-web Collectd Poller(s) Applications Carbon Whisper Dashboards Statsd Scripts Nagios
  40. 40. Collectd Agent for system/hardware level metrics Growing repository of plugins for a wide variety of applications: disk i/o, disk space, cpu, memory, mysql, JMX, java, Redis, file sizes, load, etc. https://collectd.org/wiki/index.php/Table_of_Plugins Write your custom plugin in python
  41. 41. Nagios integration You can write Nagios plugins that can alert off of metrics values Nagios can also feed graphite performance data, events (ie: update counter each time email is sent), etc.
  42. 42. What to collect?
  43. 43. Hardware/OS metrics
  44. 44. Load
  45. 45. Disk space
  46. 46. Disk I/O
  47. 47. Network data
  48. 48. Application metrics
  49. 49. How often function x is called
  50. 50. Average value of function x
  51. 51. Average running time of function x
  52. 52. Database/Datastore
  53. 53. performance metrics
  54. 54. number of records with value == ?
  55. 55. number of slow queries
  56. 56. Events
  57. 57. Deployments
  58. 58. send a 1, draw as infinite
  59. 59. Log files
  60. 60. http access logs (2xx, 3xx, 4xx, 5xx)
  61. 61. Application logs Exception counts, results, important events, hits
  62. 62. Final Musings
  63. 63. Treat graphite like ‘Big Data’
  64. 64. You don’t know what metrics you need until you need it
  65. 65. Get Raid 10 SSD’s once you decide to scale
  66. 66. More devopsy
  67. 67. You can start graphing today!

×