SlideShare a Scribd company logo

stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf

NETWAYS
NETWAYS
NETWAYSNETWAYS

Given recent economic changes, cost efficiency has become a top priority for many businesses. This is especially important for monitoring because the nature of telemetry data tends to exponential growth. Many monitoring solutions are now switching their focus to optimize costs. The talk will cover open-source instruments from VictoriaMetrics ecosystem for improving monitoring cost-efficiency. Compression optimization. While ZSTD compression and time series specific techniques like delta-encoding are great, there is still room for improvement. I’ll explain what else can be done to reduce disk footprint for long-term storage. Extra compression on data transferring between metrics collectors and TSDB. I’ll explain how VictoriaMetrics collector reduces the traffic volume by 4 times. Pre-computing for telemetry data on collectors, frequently referred to as edge computing. In VictoriaMetrics it is named as streaming aggregation and allows collectors to pre-compute data, reducing its resolution and cardinality before it is pushed to the database. This is especially important for Prometheus-like systems because streaming aggregation is compatible with Prometheus RemoteWrite protocol and can be used with any system which supports it. Cardinality explorer. Interface, which provides useful insights into data stored by the TSDB. It helps to identify the most expensive metrics or labels and see how they have changed in time. Query tracing. This feature provides details about all the stages of query execution, including time spent on index lookups, disk reads, data transfer, computation, and memory expenses. This is similar to SQL EXPLAIN feature, and helps to improve the performance of read queries. Compute efficiency. VictoriaMetrics components for collecting and storing telemetry data consume fewer resources compared to components from Prometheus ecosystem. It may sound like competition or bragging, but this is a real reason why people migrate from Prometheus to VictoriaMetrics – to cut their infrastructure costs by 2-3 times. All the features listed above are open-source and are available for everyone to use. The talk will be mostly concentrated on typical use cases in monitoring and elegant ways to make things more efficient.

stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf

1 of 55
Download to read offline
How to reduce expenses on monitoring
with VictoriaMetrics
Roman Khavronenko | github.com/hagen1778
Roman Khavronenko
Co-founder of VictoriaMetrics
Software engineer with experience in distributed systems,
monitoring and high-performance services.
https://github.com/hagen1778
https://twitter.com/hagen1778
What this talk is about
1. Best ways for storing and processing metrics
2. Open source tools only
3. For people familiar with Prometheus,
Thanos, Mimir, VictoriaMetrics
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf

Recommended

How to reduce expenses on monitoring
How to reduce expenses on monitoringHow to reduce expenses on monitoring
How to reduce expenses on monitoringRomanKhavronenko
 
DiscoveredByte - Java Performance Monitoring, Tuning and Optimization - Key P...
DiscoveredByte - Java Performance Monitoring, Tuning and Optimization - Key P...DiscoveredByte - Java Performance Monitoring, Tuning and Optimization - Key P...
DiscoveredByte - Java Performance Monitoring, Tuning and Optimization - Key P...DiscoveredByte
 
observability pre-release: using prometheus to test and fix new software
observability pre-release: using prometheus to test and fix new softwareobservability pre-release: using prometheus to test and fix new software
observability pre-release: using prometheus to test and fix new softwareSneha Inguva
 
Prometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudPrometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudSneha Inguva
 
Kafka monitoring and metrics
Kafka monitoring and metricsKafka monitoring and metrics
Kafka monitoring and metricsTouraj Ebrahimi
 
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahuDr. Prakash Sahu
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slidessmpant
 
Overcoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemOvercoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemQAware GmbH
 

More Related Content

Similar to stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf

How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?Wojciech Barczyński
 
Monitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackMonitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackWojciech Barczyński
 
Overcoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystemOvercoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystemNebulaworks
 
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsQuery Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsJaime Crespo
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemAccumulo Summit
 
ATAGTR2017 An Innovative Take on Versa Test
ATAGTR2017 An Innovative Take on Versa TestATAGTR2017 An Innovative Take on Versa Test
ATAGTR2017 An Innovative Take on Versa TestAgile Testing Alliance
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an ExporterBrian Brazil
 
Google Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixMax Kuzkin
 
Basic of jMeter
Basic of jMeter Basic of jMeter
Basic of jMeter Shub
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
Three Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyThree Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyScyllaDB
 
Measurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetMeasurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetVasyl Senko
 
"Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin airKonstantine Krutiy
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with PrometheusOpenStack Korea Community
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To PrometheusEtienne Coutaud
 

Similar to stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf (20)

How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
 
Monitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackMonitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus Stack
 
Overcoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystemOvercoming scalability issues in your prometheus ecosystem
Overcoming scalability issues in your prometheus ecosystem
 
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsQuery Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
 
ATAGTR2017 An Innovative Take on Versa Test
ATAGTR2017 An Innovative Take on Versa TestATAGTR2017 An Innovative Take on Versa Test
ATAGTR2017 An Innovative Take on Versa Test
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an Exporter
 
Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.
 
Google Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with Zabbix
 
Basic of jMeter
Basic of jMeter Basic of jMeter
Basic of jMeter
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Three Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyThree Perspectives on Measuring Latency
Three Perspectives on Measuring Latency
 
Measurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetMeasurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNet
 
"Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 

Recently uploaded

AWS RDS Data API and CloudTrail. Who drop the table_.pdf
AWS RDS Data API and CloudTrail. Who drop the table_.pdfAWS RDS Data API and CloudTrail. Who drop the table_.pdf
AWS RDS Data API and CloudTrail. Who drop the table_.pdfVladimir Samoylov
 
Ammunition and it's types and use in forensic ballistics
Ammunition and it's types and use in forensic ballisticsAmmunition and it's types and use in forensic ballistics
Ammunition and it's types and use in forensic ballisticsshubhamdwivedi7521
 
Git and Github.pptx
Git and Github.pptxGit and Github.pptx
Git and Github.pptxaymanessam16
 
Scaling up renewable energy investments in West Africa
Scaling up renewable energy investments in West AfricaScaling up renewable energy investments in West Africa
Scaling up renewable energy investments in West AfricaFrancois Stepman
 
1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptx
1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptx1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptx
1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptxNinia
 
Forklift Telehandler ONLY Operator Training Revised_7.6.2022.pptx
Forklift Telehandler ONLY Operator Training Revised_7.6.2022.pptxForklift Telehandler ONLY Operator Training Revised_7.6.2022.pptx
Forklift Telehandler ONLY Operator Training Revised_7.6.2022.pptxbriancriswell1979
 
KKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program ConceptKKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program ConceptKarthik Krishna
 
DAY 05 Book of Revelation 2-18-24 PPT.pptx
DAY 05 Book of Revelation 2-18-24 PPT.pptxDAY 05 Book of Revelation 2-18-24 PPT.pptx
DAY 05 Book of Revelation 2-18-24 PPT.pptxFamilyWorshipCenterD
 
West Africa Scene Setting African Continental Master Plan (CMP) for electrici...
West Africa Scene Setting African Continental Master Plan (CMP) for electrici...West Africa Scene Setting African Continental Master Plan (CMP) for electrici...
West Africa Scene Setting African Continental Master Plan (CMP) for electrici...Francois Stepman
 
ONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptx
ONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptxONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptx
ONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptxDivyaPatel621561
 
"SACRED WAY" Athens - VENUE- presentation
"SACRED WAY" Athens - VENUE- presentation"SACRED WAY" Athens - VENUE- presentation
"SACRED WAY" Athens - VENUE- presentationVicky Evangeliou
 

Recently uploaded (13)

AWS RDS Data API and CloudTrail. Who drop the table_.pdf
AWS RDS Data API and CloudTrail. Who drop the table_.pdfAWS RDS Data API and CloudTrail. Who drop the table_.pdf
AWS RDS Data API and CloudTrail. Who drop the table_.pdf
 
Ammunition and it's types and use in forensic ballistics
Ammunition and it's types and use in forensic ballisticsAmmunition and it's types and use in forensic ballistics
Ammunition and it's types and use in forensic ballistics
 
Git and Github.pptx
Git and Github.pptxGit and Github.pptx
Git and Github.pptx
 
Scaling up renewable energy investments in West Africa
Scaling up renewable energy investments in West AfricaScaling up renewable energy investments in West Africa
Scaling up renewable energy investments in West Africa
 
1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptx
1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptx1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptx
1.2 Ingredients Used for Sandwiches 1.3 Culinary Terms.pptx
 
Forklift Telehandler ONLY Operator Training Revised_7.6.2022.pptx
Forklift Telehandler ONLY Operator Training Revised_7.6.2022.pptxForklift Telehandler ONLY Operator Training Revised_7.6.2022.pptx
Forklift Telehandler ONLY Operator Training Revised_7.6.2022.pptx
 
KKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program ConceptKKrish - DOVE Leadership Program Concept
KKrish - DOVE Leadership Program Concept
 
An Inviting Church
An Inviting ChurchAn Inviting Church
An Inviting Church
 
DAY 05 Book of Revelation 2-18-24 PPT.pptx
DAY 05 Book of Revelation 2-18-24 PPT.pptxDAY 05 Book of Revelation 2-18-24 PPT.pptx
DAY 05 Book of Revelation 2-18-24 PPT.pptx
 
West Africa Scene Setting African Continental Master Plan (CMP) for electrici...
West Africa Scene Setting African Continental Master Plan (CMP) for electrici...West Africa Scene Setting African Continental Master Plan (CMP) for electrici...
West Africa Scene Setting African Continental Master Plan (CMP) for electrici...
 
ONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptx
ONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptxONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptx
ONLINE RESORT BOOKING SYSTEM WEBSITE 1.pptx
 
"SACRED WAY" Athens - VENUE- presentation
"SACRED WAY" Athens - VENUE- presentation"SACRED WAY" Athens - VENUE- presentation
"SACRED WAY" Athens - VENUE- presentation
 
Becoming a Member of Society.pptx
Becoming a Member of Society.pptxBecoming a Member of Society.pptx
Becoming a Member of Society.pptx
 

stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by Roman Khavronenko.pdf

  • 1. How to reduce expenses on monitoring with VictoriaMetrics Roman Khavronenko | github.com/hagen1778
  • 2. Roman Khavronenko Co-founder of VictoriaMetrics Software engineer with experience in distributed systems, monitoring and high-performance services. https://github.com/hagen1778 https://twitter.com/hagen1778
  • 3. What this talk is about 1. Best ways for storing and processing metrics 2. Open source tools only 3. For people familiar with Prometheus, Thanos, Mimir, VictoriaMetrics
  • 10. You can either have a faster car… …or be a smarter driver!
  • 11. What can you get from simple replacing?
  • 15. # the number of nodeexporter instances to scrape targetsCount: 1000 # how frequently to scrape nodeexporter targets scrapeInterval: 15s # rules evaluation interval # https://awesome-prometheus-alerts.grep.to/rules.html#host-and-hardware-1 queryInterval: 30s # scrapeConfigUpdatePercent is a churn rate generated once # per scrapeConfigUpdateInterval scrapeConfigUpdatePercent: 5 scrapeConfigUpdateInterval: 10m Prometheus vs VictoriaMetrics benchmark
  • 24. Summary after 7d benchmark (1k nodeexporter targets) Prometheus: CPU avg used: 0.79 / 3 cores Mem max used: 8.12 GiB / 12 GiB Read latency avg: 50th - 70.5ms 99th - 7s VictoriaMetrics: CPU avg used: 0.76 / 3 cores Mem max used: 4.5 GiB / 12 GiB Read latency avg: 50th - 4.3ms 99th - 3.6s
  • 28. Improving network compression 1. Increase compression level, trade CPU for network savings: a. remoteWrite.vmProtoCompressLevel 2. Increase batch size, trade latency for compression: a. remoteWrite.maxBlockSize b. remoteWrite.maxRowsPerBlock c. remoteWrite.flushInterval 3. Reduce entropy to improve compression: a. -remoteWrite.significantFigures b. -remoteWrite.roundDigits
  • 29. Keeping only significant figures instance:cpu_utilization:ratio_avg{instance="foo"} 0.05055757575781 instance:cpu_utilization:ratio_avg{instance="bar"} 0.05058181818236 rules: - record: instance:cpu_utilization:ratio_avg expr: avg_over_time(instance:node_cpu_utilization:ratio[5m])
  • 30. Keeping only significant figures Applying --vm-significant-figures=8 to recording rules 0.05055757575781 0.050557576 changed compression ratio from 1.2B to 0.8B per sample See more at https://medium.com/victoriametrics-how-to-migrate-data-from-prometheus
  • 31. How to be smarter about data
  • 32. Understanding the data - query tracing VictoriaMetrics supports query tracing for detecting bottlenecks during query processing. This is like EXPLAIN ANALYZE from Postgresql!
  • 34. If query tracing demo didn't work… Typical query takes 4s to execute… Why?
  • 35. If query tracing demo didn't work… Let's check the trace!
  • 36. If query tracing demo didn't work… Let's check the trace!
  • 37. If query tracing demo didn't work… 91% of the time was spent on vmselect while aggregating 9.4k series, 13Mil data samples!
  • 38. How to improve query speed? 1. Add more resources to monitoring. 2. Or… be smarter about data!
  • 40. If cardinality explorer demo didn't work…
  • 41. If cardinality explorer demo didn't work…
  • 42. If cardinality explorer demo didn't work…
  • 43. Cardinality explorer: summary VictoriaMetrics allows exploring time series cardinality to identify: ● Metric names with the highest number of series ● Labels with the highest number of series ● Values with the highest number of series for the selected label ● label=name pairs with the highest number of series ● Labels with the highest number of unique values * Available built-in in VictoriaMetrics components * Supports specifying Prometheus URL
  • 44. Streaming aggregation vs Recording rules The number of time series stored in TSDB is Data-in + Recording Rules results
  • 45. Streaming aggregation vs Recording rules The number of time series stored in TSDB is only what needs to be persisted
  • 46. How to use streaming aggregation - match: "grpc_server_handled_total" # timeseries selector interval: "2m" # on 2m interval outputs: ["total"] # aggregate as counter without: ["grpc_method"] # group without label Result: grpc_server_handled_total:2m_without_grpc_method_total
  • 47. How to use streaming aggregation https://play.victoriametrics.com
  • 48. Streaming aggregation: summary 1. Aggregate incoming samples in streaming mode before data is written to remote storage 2. Aggregation is applied to all the metrics received via any supported data ingestion protocol and/or scraped from Prometheus-compatible targets 3. Statsd alternative 4. Recording rules alternative 5. Reducing the number of stored samples 6. Reducing the number of stored series 7. Compatible with tools supporting Prometheus remote write protocol
  • 53. Complexity penalty ● Complex systems are harder to maintain ● Complex systems are harder to educate about ● Complex systems are more expensive to scale
  • 54. Additional materials 1. Snapshot of Grafana dashboard from the benchmark 2. Benchmark repo for reproducing the test 3. Save network costs with VictoriaMetrics remote write protocol 4. VictoriaMetrics: achieving better compression than Gorilla for time series data 5. Streaming aggregation 6. VictoriaMetrics playground