SlideShare a Scribd company logo
chronosphere.io
From Cardinal(ity) Sins to
Cost Efficient Metrics
Aggregation
Paige Cruz, retired SRE
open source observability advocate
chronosphere.io
CFO looking at the
o11y bill
chronosphere.io
chronosphere.io
chronosphere.io
Cloud Native Observability
bills are outrageous
Cloud Native Data Growth
7
Cloud
(IaaS,
VM-based)
2008 - 2018
Cloud Native
(Microservices and Containers)
2018 - ?
On-Premises
(Data center)
1998 - 2008
Business
Increase in
Scale
Observability
Data Increase
in Scale
*Source: ESG Distributed Cloud Series: Observability, Feb 2022, Scott Sinclair and Rob Strechay
chronosphere.io
Most recently [vendor] was looked at to help monitor a small Kubernetes test cluster. 3
nodes.
Now the base rate of $18/mo is fine…except now they charge $1 per container per month
past 10 containers per host.
Since K8s (depending on how you install it) runs a bunch of little containers handling various
back end things, you might not deploy anything to the cluster and still be WAY over that 10
container limit.
In our case it came out to like $200/mo to monitor 3 nodes - that were nowhere fully
loaded.
- Hacker News thread
chronosphere.io
Data volume
Experiment:
- Hello World app on 4 node
Kubernetes cluster with
Tracing, End User Metrics
(EUM), Logs, Metrics
(containers / nodes)
- 30 days == +450 GB
Mighty Metrics
“ 1 in 10 metrics are
actually directly
queried
- ServiceNow
Contributing Factors to the Metrics Bill
12
How many
things you’re
monitoring
# of containers
and infra
components
How often each
metric is
scraped
Metric
Granularity
How long you
keep the data
Retention
Window
How many
unique combos
of dimensions
on metrics
Cardinality
12
13
14
Cost of monitoring can be a factor in determining how quickly
to deprecate or sunset features/services/environments
# of containers and infra components
14
15
Emission time = adjust scrape_interval (from 10s samples ->
30s samples)
Ingest time = aggregate
Over time post-storage = downsampling
Metric Granularity
15
16
Aggregation: Roll Up
16
17
For operational metrics……most (99.9%) of queries do
not pass 7 days but average retention at original
granularity ranges from 2-4 weeks
Retention Window
17
18
Low value tags or entire metrics should be dropped
as early as possible
Dropping Data
18
19
Cardinality
19
Auditing Your
Metrics
“
What is the value
of this metric?
- You, when auditing metrics
Auditing Your Metrics
22
22
● Scope what your team is responsible for
○ filter queries with team:YOURS
● Identify easy wins. Metrics that aren’t
○ In a monitor definition
○ Directly queried by end users
○ Powering charts for visited dashboards
● Identify labels that are unnecessary
○ e.g. prometheus instance label or instance_type
● Share your successes!
23
23
24
24
CFO looking at the
cost efficiency of
metrics
Resources
- How Gloo uses the OTel Collector to drop metrics/labels
and provide the Minimum Metrics Set
- How to drop and delete metrics in Prometheus
- How can recording and data roll-up rules help your
metrics?
- Observability is Too Damn Expensive - DevOpsDays London
Catch up with me:
- Rescuing On-Call Engineers
(send your manager)
- KubeCon OTel 101: Let’s
Instrument! (tracing) workshop
- There’s No Place Like
Production Conf42 Incident
Management
paigerduty@
chronosphere.io
hachyderm.io
LinkedIn
Q&A

More Related Content

Similar to From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation

Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Chris Richardson
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Vianney FOUCAULT
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
Altinity Ltd
 
Container world 2019 Canary Release
Container world 2019 Canary ReleaseContainer world 2019 Canary Release
Container world 2019 Canary Release
Billy Yuen
 
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous IntegrationCloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
VMware Tanzu
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business framework
CBOD ANR project U-PSUD
 
Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...
Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...
Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...
confluent
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
IBM Danmark
 
Running OpenStack in Production
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in Production
Tesora
 
Outsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the CloudOutsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the Cloud
Rackspace
 
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
CA Technologies
 
1806 cosmic progress
1806 cosmic progress1806 cosmic progress
1806 cosmic progress
Charles Symons
 
Jazz for Service Management
Jazz for Service ManagementJazz for Service Management
Jazz for Service Management
IBM Danmark
 
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET Journal
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
Jim Kaplan CIA CFE
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
Yosuke Mizutani
 
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key AbstractionEnhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
paperpublications3
 
The Future of Secure Digital Transactions: QTMaaS
The Future of Secure Digital Transactions: QTMaaSThe Future of Secure Digital Transactions: QTMaaS
The Future of Secure Digital Transactions: QTMaaS
Steve Downer
 

Similar to From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation (20)

Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
 
Container world 2019 Canary Release
Container world 2019 Canary ReleaseContainer world 2019 Canary Release
Container world 2019 Canary Release
 
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous IntegrationCloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business framework
 
Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...
Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...
Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
 
Running OpenStack in Production
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in Production
 
Outsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the CloudOutsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the Cloud
 
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
 
1806 cosmic progress
1806 cosmic progress1806 cosmic progress
1806 cosmic progress
 
Jazz for Service Management
Jazz for Service ManagementJazz for Service Management
Jazz for Service Management
 
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key AbstractionEnhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
 
The Future of Secure Digital Transactions: QTMaaS
The Future of Secure Digital Transactions: QTMaaSThe Future of Secure Digital Transactions: QTMaaS
The Future of Secure Digital Transactions: QTMaaS
 

More from Paige Cruz

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Paige Cruz
 
Power Up with Podman - Kubernetes Community Day LA
Power Up with Podman - Kubernetes Community Day LAPower Up with Podman - Kubernetes Community Day LA
Power Up with Podman - Kubernetes Community Day LA
Paige Cruz
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
Paige Cruz
 
OTel Orientation: How to Train Teams (OTel in Practice)
OTel Orientation: How to Train Teams (OTel in Practice)OTel Orientation: How to Train Teams (OTel in Practice)
OTel Orientation: How to Train Teams (OTel in Practice)
Paige Cruz
 
Avoiding Alert Bankruptcy and Burnout
 Avoiding Alert Bankruptcy and Burnout Avoiding Alert Bankruptcy and Burnout
Avoiding Alert Bankruptcy and Burnout
Paige Cruz
 
Tracing Adventures from PR - Production
Tracing Adventures from PR - ProductionTracing Adventures from PR - Production
Tracing Adventures from PR - Production
Paige Cruz
 
Threat Modeling in the Cloud
Threat Modeling in the CloudThreat Modeling in the Cloud
Threat Modeling in the Cloud
Paige Cruz
 
There's No Place Like Production
There's No Place Like ProductionThere's No Place Like Production
There's No Place Like Production
Paige Cruz
 
Taming Feral DevOps
Taming Feral DevOps Taming Feral DevOps
Taming Feral DevOps
Paige Cruz
 
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of PowerSRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
Paige Cruz
 
Pushing Observability Uphill - The Single “Pain” of Glass
Pushing Observability Uphill - The Single “Pain” of GlassPushing Observability Uphill - The Single “Pain” of Glass
Pushing Observability Uphill - The Single “Pain” of Glass
Paige Cruz
 
Power Up with Podman
Power Up with PodmanPower Up with Podman
Power Up with Podman
Paige Cruz
 
Intro to Instrumentation
Intro to InstrumentationIntro to Instrumentation
Intro to Instrumentation
Paige Cruz
 
99.9% of Your Traces are Trash
99.9% of Your Traces are Trash99.9% of Your Traces are Trash
99.9% of Your Traces are Trash
Paige Cruz
 
3rd Wave Observability: Open or Bust
3rd Wave Observability: Open or Bust 3rd Wave Observability: Open or Bust
3rd Wave Observability: Open or Bust
Paige Cruz
 

More from Paige Cruz (16)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Power Up with Podman - Kubernetes Community Day LA
Power Up with Podman - Kubernetes Community Day LAPower Up with Podman - Kubernetes Community Day LA
Power Up with Podman - Kubernetes Community Day LA
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
OTel Orientation: How to Train Teams (OTel in Practice)
OTel Orientation: How to Train Teams (OTel in Practice)OTel Orientation: How to Train Teams (OTel in Practice)
OTel Orientation: How to Train Teams (OTel in Practice)
 
Avoiding Alert Bankruptcy and Burnout
 Avoiding Alert Bankruptcy and Burnout Avoiding Alert Bankruptcy and Burnout
Avoiding Alert Bankruptcy and Burnout
 
Tracing Adventures from PR - Production
Tracing Adventures from PR - ProductionTracing Adventures from PR - Production
Tracing Adventures from PR - Production
 
Threat Modeling in the Cloud
Threat Modeling in the CloudThreat Modeling in the Cloud
Threat Modeling in the Cloud
 
There's No Place Like Production
There's No Place Like ProductionThere's No Place Like Production
There's No Place Like Production
 
Taming Feral DevOps
Taming Feral DevOps Taming Feral DevOps
Taming Feral DevOps
 
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of PowerSRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
SRECon23 Cognitive Apprenticeship in Action_ Alert Triage Hour of Power
 
Pushing Observability Uphill - The Single “Pain” of Glass
Pushing Observability Uphill - The Single “Pain” of GlassPushing Observability Uphill - The Single “Pain” of Glass
Pushing Observability Uphill - The Single “Pain” of Glass
 
Power Up with Podman
Power Up with PodmanPower Up with Podman
Power Up with Podman
 
Intro to Instrumentation
Intro to InstrumentationIntro to Instrumentation
Intro to Instrumentation
 
99.9% of Your Traces are Trash
99.9% of Your Traces are Trash99.9% of Your Traces are Trash
99.9% of Your Traces are Trash
 
3rd Wave Observability: Open or Bust
3rd Wave Observability: Open or Bust 3rd Wave Observability: Open or Bust
3rd Wave Observability: Open or Bust
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation

  • 1. chronosphere.io From Cardinal(ity) Sins to Cost Efficient Metrics Aggregation Paige Cruz, retired SRE open source observability advocate
  • 7. Cloud Native Data Growth 7 Cloud (IaaS, VM-based) 2008 - 2018 Cloud Native (Microservices and Containers) 2018 - ? On-Premises (Data center) 1998 - 2008 Business Increase in Scale Observability Data Increase in Scale *Source: ESG Distributed Cloud Series: Observability, Feb 2022, Scott Sinclair and Rob Strechay
  • 8. chronosphere.io Most recently [vendor] was looked at to help monitor a small Kubernetes test cluster. 3 nodes. Now the base rate of $18/mo is fine…except now they charge $1 per container per month past 10 containers per host. Since K8s (depending on how you install it) runs a bunch of little containers handling various back end things, you might not deploy anything to the cluster and still be WAY over that 10 container limit. In our case it came out to like $200/mo to monitor 3 nodes - that were nowhere fully loaded. - Hacker News thread
  • 9. chronosphere.io Data volume Experiment: - Hello World app on 4 node Kubernetes cluster with Tracing, End User Metrics (EUM), Logs, Metrics (containers / nodes) - 30 days == +450 GB
  • 11. “ 1 in 10 metrics are actually directly queried - ServiceNow
  • 12. Contributing Factors to the Metrics Bill 12 How many things you’re monitoring # of containers and infra components How often each metric is scraped Metric Granularity How long you keep the data Retention Window How many unique combos of dimensions on metrics Cardinality 12
  • 13. 13
  • 14. 14 Cost of monitoring can be a factor in determining how quickly to deprecate or sunset features/services/environments # of containers and infra components 14
  • 15. 15 Emission time = adjust scrape_interval (from 10s samples -> 30s samples) Ingest time = aggregate Over time post-storage = downsampling Metric Granularity 15
  • 17. 17 For operational metrics……most (99.9%) of queries do not pass 7 days but average retention at original granularity ranges from 2-4 weeks Retention Window 17
  • 18. 18 Low value tags or entire metrics should be dropped as early as possible Dropping Data 18
  • 21. “ What is the value of this metric? - You, when auditing metrics
  • 22. Auditing Your Metrics 22 22 ● Scope what your team is responsible for ○ filter queries with team:YOURS ● Identify easy wins. Metrics that aren’t ○ In a monitor definition ○ Directly queried by end users ○ Powering charts for visited dashboards ● Identify labels that are unnecessary ○ e.g. prometheus instance label or instance_type ● Share your successes!
  • 23. 23 23
  • 24. 24 24 CFO looking at the cost efficiency of metrics
  • 25. Resources - How Gloo uses the OTel Collector to drop metrics/labels and provide the Minimum Metrics Set - How to drop and delete metrics in Prometheus - How can recording and data roll-up rules help your metrics? - Observability is Too Damn Expensive - DevOpsDays London
  • 26. Catch up with me: - Rescuing On-Call Engineers (send your manager) - KubeCon OTel 101: Let’s Instrument! (tracing) workshop - There’s No Place Like Production Conf42 Incident Management paigerduty@ chronosphere.io hachyderm.io LinkedIn
  • 27. Q&A