Graphite at CityGrid
if you can’t measure it, you can’t fix it
Wil Heitritter
Director, Tech Ops
Los Angeles DevOps
2014/0...
Magnum esse solem
philosophus probabit,
quantus sit mathematicus
-Seneca
Objectives
- Introduce Graphite to new users
- Show what we like, what we hate
- Present some interesting use-cases
- Gene...
Before Graphite
Ganglia
• Predictable interface
• Text “metrics” to store versions
• Slow
• Couldn’t pick and choose metri...
Why ganglia sucked
- Clusters had to be pre-configured
- Multicast vs. Unicast
- Data Retention
- Static Web Interface (ca...
What did we think wanted?
Ease of adding metrics
Ease of sending metrics
Powerful metric display
Retain ganglia-style clus...
Graphite!
What is Graphite?
a highly scalable real-time graphing system
which collects numeric time-series data
is managed by carbon...
Graphite: what we like
Sending metrics is simple
Retrieving metrics is simple
Dashboard creation and sharing… is simple
Ma...
Graphite: what sucks
Dashboard ownership/promotion
No ganglia-like standard dashboard
Data retention… is NOT as simple as ...
CityGrid’s
Graphite
Implementation
Metric Naming
Business Metrics
- These are metrics that are not specific to a
specific server
- Format:
business.${hierarc...
Metric Naming
Server Metrics
- These metrics are specific to a particular
server (just like ganglia)
- Format:
servers.${c...
Sending metrics
Sending directly from metric scripts
- /etc/graphite.conf
- May need to spread out sending if in volume
Co...
Impact of staggered sending
Sending is simply...
echo $metric $value $timestamp | nc $relay $port
Performance
carbon-cache/carbon-relay
SSD
replication within minutes
Maintenance
Changing retention
- whisper-auto-resize.py
Filling holes
- whisper-fill $source $destination
Backups
- Dashbo...
Graphite Use-Cases
Single Metric
Combined Metrics
Key Metrics Dashboard
Examples of Key Metrics
- QPS
- Processing Time (Max/Mean/Distribution)
- Metrics about sub-requests...
Key Metrics Dashboard
Nagios Integration
check_graphite_target!highestMax(
servers.mai.@HOSTNAME@.LW_map_return_code_5*_ratio,
1
)!5!10
How about Pie Charts?
Ad-Hoc Dashboards
Demo
What NOT to do
Trying it out for yourself
Quick Setup
Install & Start
# pip install https://github.com/graphite-project/ceres/tarball/master
# pip install whisper
#...
Discussion
Graphite at CityGrid - LA DevOps April 2014
Graphite at CityGrid - LA DevOps April 2014
Graphite at CityGrid - LA DevOps April 2014
Upcoming SlideShare
Loading in …5
×

Graphite at CityGrid - LA DevOps April 2014

535 views

Published on

High-level description of CityGrid's use of Graphite for collecting/displaying metrics, along with some interesting use-cases.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
535
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Graphite at CityGrid - LA DevOps April 2014

  1. 1. Graphite at CityGrid if you can’t measure it, you can’t fix it Wil Heitritter Director, Tech Ops Los Angeles DevOps 2014/04/28
  2. 2. Magnum esse solem philosophus probabit, quantus sit mathematicus -Seneca
  3. 3. Objectives - Introduce Graphite to new users - Show what we like, what we hate - Present some interesting use-cases - Generate discussion
  4. 4. Before Graphite Ganglia • Predictable interface • Text “metrics” to store versions • Slow • Couldn’t pick and choose metrics to see
  5. 5. Why ganglia sucked - Clusters had to be pre-configured - Multicast vs. Unicast - Data Retention - Static Web Interface (can’t pick and choose) - Static Host List
  6. 6. What did we think wanted? Ease of adding metrics Ease of sending metrics Powerful metric display Retain ganglia-style cluster dashboards Long-term configurable metric retention
  7. 7. Graphite!
  8. 8. What is Graphite? a highly scalable real-time graphing system which collects numeric time-series data is managed by carbon and stored as whisper files and visualized through web interfaces or queried via the API http://graphite.wikidot.com/
  9. 9. Graphite: what we like Sending metrics is simple Retrieving metrics is simple Dashboard creation and sharing… is simple Many functions() 120MM+ metric values received daily Backfilling past metrics is simple Expandable - different frontends
  10. 10. Graphite: what sucks Dashboard ownership/promotion No ganglia-like standard dashboard Data retention… is NOT as simple as we thought
  11. 11. CityGrid’s Graphite Implementation
  12. 12. Metric Naming Business Metrics - These are metrics that are not specific to a specific server - Format: business.${hierarchical}.${path}.${here}.$metric - Example: business.ec2.testaccount.us-east-1a.OnDemand.running.m2.4xlarge
  13. 13. Metric Naming Server Metrics - These metrics are specific to a particular server (just like ganglia) - Format: servers.${class}.${f_q_d_n}.${metric} - Example: servers.rvw.aws1prdrvw1_subdom_cityg_com.LW_api_reviews_QPS
  14. 14. Sending metrics Sending directly from metric scripts - /etc/graphite.conf - May need to spread out sending if in volume Collecting from gmond every minute - Metrics are spread out to prevent spiking - False data (gmond acts as a cache)
  15. 15. Impact of staggered sending
  16. 16. Sending is simply... echo $metric $value $timestamp | nc $relay $port
  17. 17. Performance carbon-cache/carbon-relay SSD replication within minutes
  18. 18. Maintenance Changing retention - whisper-auto-resize.py Filling holes - whisper-fill $source $destination Backups - Dashboards - Metrics
  19. 19. Graphite Use-Cases
  20. 20. Single Metric
  21. 21. Combined Metrics
  22. 22. Key Metrics Dashboard Examples of Key Metrics - QPS - Processing Time (Max/Mean/Distribution) - Metrics about sub-requests - Network usage - CPU/load
  23. 23. Key Metrics Dashboard
  24. 24. Nagios Integration check_graphite_target!highestMax( servers.mai.@HOSTNAME@.LW_map_return_code_5*_ratio, 1 )!5!10
  25. 25. How about Pie Charts?
  26. 26. Ad-Hoc Dashboards Demo
  27. 27. What NOT to do
  28. 28. Trying it out for yourself
  29. 29. Quick Setup Install & Start # pip install https://github.com/graphite-project/ceres/tarball/master # pip install whisper # pip install carbon # pip install graphite-web start it up... send it a metric: echo business.test.metric1 1 `date “+%s”` | nc localhost 2003 OK, it’s almost that easy...
  30. 30. Discussion

×