Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Highly Available Graphite


Published on

Initially presented at OpenWest 2014 conference.

Graphite and StatsD gather line series data and offer a robust set of APIs to access that data. While the tools are robust, the dashboards are straight from 1992 and alerting off the data is nonexistent. Nark, an opensource project, solves both of these problems. It provides easy to use dashboards and readily available alerts and notifications to users. It has been used in production at Lucid Software for almost a year. Related to Nark are the tools required to make Graphite highly available.

Published in: Software, Technology, Business
  • Be the first to comment

Highly Available Graphite

  1. 1. GRAPHITE: HIGHLY AVAILABLE Alyssa Stringham & Matthew Barlocker
  2. 2. About Alyssa  Software Developer at Lucid Software Inc  BYU graduate with Bachelors in Computer Science  I love  Playing the carillon and piano  Fast-paced board games  Hats  Traveling  Playing foosball
  3. 3. About “The Barlocker” • Chief Architect at Lucid Software Inc • Bachelors degree from BYU in Computer Science • I love to • play board games • go 4-wheeling • wrestle my sons • fly airplanes • Follow me on
  4. 4. Tools
  5. 5. Graphite  Graphite is a highly scalable real-time graphing system  Initially developed by Chris Davis at  Comprised of 3 related projects  Carbon – collects and records metrics  Whisper – Backend storage mechanism  Graphite-Web – HTTP frontend that displays graphs  Written in Python  
  6. 6. StatsD  A network daemon that aggregates statistics for backend services.  Developed by Etsy  Written in Node.js   -anything-measure-everything/
  7. 7. HA Receiver  Used to make StatsD highly available and scalable.  Initially developed by Matthew Barlocker at Lucid Software Inc  Written in Node 
  8. 8. Nark  Nark is an alerting and dashboard frontend for Graphite.  Under active development by Lucid Software.  Written in Scala using the Play! Framework  MySQL backed 
  9. 9. Demo
  10. 10. Data Flow Overview
  11. 11. Data Flows IN  Applications report different types of metrics  StatsD aggregates metrics  Carbon-cache gathers and groups metrics  Whisper stores metrics to disk
  12. 12. Data Flows OUT  User initiates request over HTTP  Graphite-web requests information from carbon-cache  Carbon-cache reads data from disk using whisper  Graphite-web builds graph using data
  13. 13. High Availability & Scaling
  14. 14. StatsD - Options  We can put StatsD in 3 places:  On the reporting server  Scales as well as your reporting servers do  As available as the reporting servers are  Can’t get vital metrics like stats.production.applications.chart.users.login  On a central server  Doesn’t scale  Single point of failure  On a load-balanced set of servers  AWS ELB doesn’t listen on UDP  One stat will be aggregated in multiple places
  15. 15. StatsD - Solution  StatsD with smart- repeater on reporting servers  Accepts UDP and sends TCP for reliability  Reduces chattiness over the wire  Allows aggregation to occur at a centralized location  As scalable and available as the application servers
  16. 16. StatsD - Solution  AWS Elastic Load Balancer distributes traffic to ha-receivers  HA-receivers:  Duplicate and transform metrics  Deliver metrics to correct server for aggregation  Are stateless – they scale horizontally  Are highly available behind the ELB
  17. 17. StatsD - Solution  HA-receivers pass the data to StatsD  StatsD does the final aggregation  Every metric has exactly one StatsD destination  Aggregated metrics are sent to carbon
  18. 18. Carbon & Whisper  Carbon and whisper direct data to disk  The daemons are stateless except for buffers  Carbon consists of multiple daemons  Carbon-relay: Direct traffic to other carbon daemons  Carbon-aggregator: A mix between carbon-relay and StatsD  Carbon-cache: Gather metrics in a buffer, and write them to disk using whisper  Whisper is called from carbon-cache, and is short- lived
  19. 19. Carbon & Whisper  We chose to use sharding  Every server holds 1/n metrics, where n = # shards  All servers in a shard hold the same data  Syncing data requires a single rsync  A b-tree of carbon-relays is used to pick a shard  Adding new shards is as easy as adding a new node in the b-tree of carbon-relays  Retrieving data can be done by checking one server from every shard
  20. 20. Carbon & Whisper  StatsD sends metrics to the root carbon-relay on localhost  Carbon-relay is setup in a binary tree to pick a shard  Every metric goes to exactly one shard  Every carbon-relay goes to either 1 shard or 2 relays
  21. 21. Carbon & Whisper  Carbon-cache receives the metrics from the final relay  Metrics are written to disk using whisper on localhost  Carbon-cache has a last-in-wins policy
  22. 22. graphite-web  Graphite-web is stateless  All state is contained within carbon-cache  Reading data out from a highly available, scalable graphite installation is the same as reading from a single server  Use the same ELB as the ha-receiver
  23. 23. Nark  Nark is stateless  All state is contained in MySQL and Graphite  Nark will be no more highly available than your MySQL and Graphite installations  Use an ELB, an autoscale group, and a multi-AZ RDS instance
  24. 24. Recap
  25. 25. Questions? Feature Requests? Thanks For Your Time
  26. 26. Join The Team • Building the next generation of collaborative web applications • VC funded • High growth rate • Profitable • Graduates from Harvard, MIT, Stanford • Former Google, Amazon, Microsoft employees