Graphite at CityGrid - LA DevOps April 2014

•Download as PPTX, PDF•

0 likes•614 views

High-level description of CityGrid's use of Graphite for collecting/displaying metrics, along with some interesting use-cases.

Technology Business

Graphite at CityGrid
if you can’t measure it, you can’t fix it
Wil Heitritter
Director, Tech Ops
Los Angeles DevOps
2014/04/28

Magnum esse solem
philosophus probabit,
quantus sit mathematicus
-Seneca

Objectives
- Introduce Graphite to new users
- Show what we like, what we hate
- Present some interesting use-cases
- Generate discussion

Before Graphite
Ganglia
• Predictable interface
• Text “metrics” to store versions
• Slow
• Couldn’t pick and choose metrics to see

Why ganglia sucked
- Clusters had to be pre-configured
- Multicast vs. Unicast
- Data Retention
- Static Web Interface (can’t pick and choose)
- Static Host List

What did we think wanted?
Ease of adding metrics
Ease of sending metrics
Powerful metric display
Retain ganglia-style cluster dashboards
Long-term configurable metric retention

What is Graphite?
a highly scalable real-time graphing system
which collects numeric time-series data
is managed by carbon
and stored as whisper files
and visualized through web interfaces
or queried via the API
http://graphite.wikidot.com/

Graphite: what we like
Sending metrics is simple
Retrieving metrics is simple
Dashboard creation and sharing… is simple
Many functions()
120MM+ metric values received daily
Backfilling past metrics is simple
Expandable - different frontends

Graphite: what sucks
Dashboard ownership/promotion
No ganglia-like standard dashboard
Data retention… is NOT as simple as we
thought

Metric Naming
Business Metrics
- These are metrics that are not specific to a
specific server
- Format:
business.${hierarchical}.${path}.${here}.$metric
- Example:
business.ec2.testaccount.us-east-1a.OnDemand.running.m2.4xlarge

$Metric Naming Server Metrics - These metrics are specific to a particular server (just like ganglia) - Format: servers.${class}.${f_q_d_n}.${metric} - Example: servers.rvw.aws1prdrvw1_subdom_cityg_com.LW_api_reviews_QPS$

Sending metrics
Sending directly from metric scripts
- /etc/graphite.conf
- May need to spread out sending if in volume
Collecting from gmond every minute
- Metrics are spread out to prevent spiking
- False data (gmond acts as a cache)

Sending is simply...
echo $metric $value $timestamp | nc $relay $port

Performance
carbon-cache/carbon-relay
SSD
replication within minutes

Maintenance
Changing retention
- whisper-auto-resize.py
Filling holes
- whisper-fill $source $destination
Backups
- Dashboards
- Metrics

Key Metrics Dashboard
Examples of Key Metrics
- QPS
- Processing Time (Max/Mean/Distribution)
- Metrics about sub-requests
- Network usage
- CPU/load

Nagios Integration
check_graphite_target!highestMax(
servers.mai.@HOSTNAME@.LW_map_return_code_5*_ratio,
1
)!5!10

Quick Setup
Install & Start
# pip install https://github.com/graphite-project/ceres/tarball/master
# pip install whisper
# pip install carbon
# pip install graphite-web
start it up...
send it a metric:
echo business.test.metric1 1 `date “+%s”` | nc localhost 2003
OK, it’s almost that easy...

Graphite at CityGrid - LA DevOps April 2014

What's hot

Grafana optimization for PrometheusMitsuhiro Tanda

Structured Streaming in SparkDigital Vidya

Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li

ADSL pptSathish Kumar D M

How to Get the Most Out of LiDAR DataSafe Software

GrafanaNoelMc Grath

Globe claritas v6.5 at a glanceGuy Maslen

Design of Capability Delivery Adjustments @ASDENCAJānis Grabis

Go for Real Time Streaming Architectures - DotGo 2017Mickaël Rémond

3D Solution Templates - Making the World 3DSafe Software

KliqObjects OverviewKT-Labs

Introduction to GraphQL & ServerlessElijah Astley

Deep Dive into FME Desktop 2017Safe Software

Process-driven applications: let BPM do (some of) your workKris Verlaenen

KliqMap for Esri: Actionable Location AnalyticsKT-Labs

Linear Referencing (LRS): How FME Measures UpSafe Software

KliqPlan OverviewKT-Labs

What's hot (17)

Grafana optimization for Prometheus

Structured Streaming in Spark

Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018

ADSL ppt

How to Get the Most Out of LiDAR Data

Grafana

Globe claritas v6.5 at a glance

Design of Capability Delivery Adjustments @ASDENCA

Go for Real Time Streaming Architectures - DotGo 2017

3D Solution Templates - Making the World 3D

KliqObjects Overview

Introduction to GraphQL & Serverless

Deep Dive into FME Desktop 2017

Process-driven applications: let BPM do (some of) your work

KliqMap for Esri: Actionable Location Analytics

Linear Referencing (LRS): How FME Measures Up

KliqPlan Overview

Similar to Graphite at CityGrid - LA DevOps April 2014

Python and trending_data_opschase pettet

Revolutionise your Machine Learning Workflow using Scikit-Learn PipelinesPhilip Goddard

Uber Business Metrics Generation and Management Through Apache FlinkWenrui Meng

GraphiteGlenn Poston

StasD & Graphite - Measure anything, Measure EverythingAvi Revivo

Extending 3D Model Visualization with FME 2017Safe Software

What Prometheus means for monitoring vendorsSysdig

Data Science in the Elastic StackRochelle Sonnenberg

Kks sre book_ch10Chris Huang

Real-time Application MonitoringAmit Kumar Gupta

MySQL performance monitoring using Statsd and GraphiteDB-Art

cametrics-report-finalOlmo F. Maldonado

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017MLconf

Slack in the Age of PrometheusGeorge Luong

MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...Piyush Kumar

Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA

How to Automate CAD & GIS IntegrationSafe Software

Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX

Bridging Between CAD & GIS: 6 Ways to Automate Your Data IntegrationSafe Software

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

Similar to Graphite at CityGrid - LA DevOps April 2014 (20)

Python and trending_data_ops

Revolutionise your Machine Learning Workflow using Scikit-Learn Pipelines

Uber Business Metrics Generation and Management Through Apache Flink

Graphite

StasD & Graphite - Measure anything, Measure Everything

Extending 3D Model Visualization with FME 2017

What Prometheus means for monitoring vendors

Data Science in the Elastic Stack

Kks sre book_ch10

Real-time Application Monitoring

MySQL performance monitoring using Statsd and Graphite

cametrics-report-final

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Slack in the Age of Prometheus

MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...

Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...

How to Automate CAD & GIS Integration

Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Recently uploaded

Why Teams call analytics are critical to your entire businesspanagenda

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

MINDCTI Revenue Release Quarter One 2024MIND CTI

Introduction to use of FHIR Documents in ABDMKumar Satyam

Platformless Horizons for Digital AdaptabilityWSO2

Quantum Leap in Next-Generation ComputingWSO2

Architecting Cloud Native ApplicationsWSO2

The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software

Navigating Identity and Access Management in the Modern EnterpriseWSO2

DBX First Quarter 2024 Investor PresentationDropbox

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc

JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37

ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek

Choreo: Empowering the Future of Enterprise Software EngineeringWSO2

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash

Exploring Multimodal Embeddings with MilvusZilliz

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Recently uploaded (20)

Why Teams call analytics are critical to your entire business

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

MINDCTI Revenue Release Quarter One 2024

Introduction to use of FHIR Documents in ABDM

Platformless Horizons for Digital Adaptability

Quantum Leap in Next-Generation Computing

Architecting Cloud Native Applications

The Zero-ETL Approach: Enhancing Data Agility and Insight

Navigating Identity and Access Management in the Modern Enterprise

DBX First Quarter 2024 Investor Presentation

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...

JohnPollard-hybrid-app-RailsConf2024.pptx

ChatGPT and Beyond - Elevating DevOps Productivity

Choreo: Empowering the Future of Enterprise Software Engineering

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

Exploring Multimodal Embeddings with Milvus

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Graphite at CityGrid - LA DevOps April 2014

1. Graphite at CityGrid if you can’t measure it, you can’t fix it Wil Heitritter Director, Tech Ops Los Angeles DevOps 2014/04/28

2. Magnum esse solem philosophus probabit, quantus sit mathematicus -Seneca

3. Objectives - Introduce Graphite to new users - Show what we like, what we hate - Present some interesting use-cases - Generate discussion

4. Before Graphite Ganglia • Predictable interface • Text “metrics” to store versions • Slow • Couldn’t pick and choose metrics to see

5. Why ganglia sucked - Clusters had to be pre-configured - Multicast vs. Unicast - Data Retention - Static Web Interface (can’t pick and choose) - Static Host List

6. What did we think wanted? Ease of adding metrics Ease of sending metrics Powerful metric display Retain ganglia-style cluster dashboards Long-term configurable metric retention

7. Graphite!

8. What is Graphite? a highly scalable real-time graphing system which collects numeric time-series data is managed by carbon and stored as whisper files and visualized through web interfaces or queried via the API http://graphite.wikidot.com/

9. Graphite: what we like Sending metrics is simple Retrieving metrics is simple Dashboard creation and sharing… is simple Many functions() 120MM+ metric values received daily Backfilling past metrics is simple Expandable - different frontends

10. Graphite: what sucks Dashboard ownership/promotion No ganglia-like standard dashboard Data retention… is NOT as simple as we thought

11. CityGrid’s Graphite Implementation

12. Metric Naming Business Metrics - These are metrics that are not specific to a specific server - Format: business.${hierarchical}.${path}.${here}.$metric - Example: business.ec2.testaccount.us-east-1a.OnDemand.running.m2.4xlarge

13. Metric Naming Server Metrics - These metrics are specific to a particular server (just like ganglia) - Format: servers.${class}.${f_q_d_n}.${metric} - Example: servers.rvw.aws1prdrvw1_subdom_cityg_com.LW_api_reviews_QPS

14. Sending metrics Sending directly from metric scripts - /etc/graphite.conf - May need to spread out sending if in volume Collecting from gmond every minute - Metrics are spread out to prevent spiking - False data (gmond acts as a cache)

15. Impact of staggered sending

16. Sending is simply... echo $metric $value $timestamp | nc $relay $port

17. Performance carbon-cache/carbon-relay SSD replication within minutes

18. Maintenance Changing retention - whisper-auto-resize.py Filling holes - whisper-fill $source $destination Backups - Dashboards - Metrics

19. Graphite Use-Cases

20. Single Metric

21. Combined Metrics

22. Key Metrics Dashboard Examples of Key Metrics - QPS - Processing Time (Max/Mean/Distribution) - Metrics about sub-requests - Network usage - CPU/load

23. Key Metrics Dashboard

24.

25. Nagios Integration check_graphite_target!highestMax( servers.mai.@HOSTNAME@.LW_map_return_code_5*_ratio, 1 )!5!10

26. How about Pie Charts?

27.

28. Ad-Hoc Dashboards Demo

29. What NOT to do

30. Trying it out for yourself

31. Quick Setup Install & Start # pip install https://github.com/graphite-project/ceres/tarball/master # pip install whisper # pip install carbon # pip install graphite-web start it up... send it a metric: echo business.test.metric1 1 `date “+%s”` | nc localhost 2003 OK, it’s almost that easy...

32. Discussion

Graphite at CityGrid - LA DevOps April 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Graphite at CityGrid - LA DevOps April 2014

Similar to Graphite at CityGrid - LA DevOps April 2014 (20)

Recently uploaded

Recently uploaded (20)

Graphite at CityGrid - LA DevOps April 2014