Given recent economic changes, cost efficiency has become a top priority for many businesses. This is especially important for monitoring because the nature of telemetry data tends to exponential growth. Many monitoring solutions are now switching their focus to optimize costs. The talk will cover open-source instruments from VictoriaMetrics ecosystem for improving monitoring cost-efficiency.
Compression optimization. While ZSTD compression and time series specific techniques like delta-encoding are great, there is still room for improvement. I’ll explain what else can be done to reduce disk footprint for long-term storage.
Extra compression on data transferring between metrics collectors and TSDB. I’ll explain how VictoriaMetrics collector reduces the traffic volume by 4 times.
Pre-computing for telemetry data on collectors, frequently referred to as edge computing. In VictoriaMetrics it is named as streaming aggregation and allows collectors to pre-compute data, reducing its resolution and cardinality before it is pushed to the database. This is especially important for Prometheus-like systems because streaming aggregation is compatible with Prometheus RemoteWrite protocol and can be used with any system which supports it.
Cardinality explorer. Interface, which provides useful insights into data stored by the TSDB. It helps to identify the most expensive metrics or labels and see how they have changed in time.
Query tracing. This feature provides details about all the stages of query execution, including time spent on index lookups, disk reads, data transfer, computation, and memory expenses. This is similar to SQL EXPLAIN feature, and helps to improve the performance of read queries.
Compute efficiency. VictoriaMetrics components for collecting and storing telemetry data consume fewer resources compared to components from Prometheus ecosystem. It may sound like competition or bragging, but this is a real reason why people migrate from Prometheus to VictoriaMetrics – to cut their infrastructure costs by 2-3 times.
All the features listed above are open-source and are available for everyone to use. The talk will be mostly concentrated on typical use cases in monitoring and elegant ways to make things more efficient.
Reduce monitoring expenses with VictoriaMetrics' high performance and data reduction features
1. How to reduce expenses on monitoring
with VictoriaMetrics
Roman Khavronenko | github.com/hagen1778
2. Roman Khavronenko
Co-founder of VictoriaMetrics
Software engineer with experience in distributed systems,
monitoring and high-performance services.
https://github.com/hagen1778
https://twitter.com/hagen1778
3. What this talk is about
1. Best ways for storing and processing metrics
2. Open source tools only
3. For people familiar with Prometheus,
Thanos, Mimir, VictoriaMetrics
15. # the number of nodeexporter instances to scrape
targetsCount: 1000
# how frequently to scrape nodeexporter targets
scrapeInterval: 15s
# rules evaluation interval
# https://awesome-prometheus-alerts.grep.to/rules.html#host-and-hardware-1
queryInterval: 30s
# scrapeConfigUpdatePercent is a churn rate generated once
# per scrapeConfigUpdateInterval
scrapeConfigUpdatePercent: 5
scrapeConfigUpdateInterval: 10m
Prometheus vs VictoriaMetrics benchmark
28. Improving network compression
1. Increase compression level, trade CPU for network savings:
a. remoteWrite.vmProtoCompressLevel
2. Increase batch size, trade latency for compression:
a. remoteWrite.maxBlockSize
b. remoteWrite.maxRowsPerBlock
c. remoteWrite.flushInterval
3. Reduce entropy to improve compression:
a. -remoteWrite.significantFigures
b. -remoteWrite.roundDigits
30. Keeping only significant figures
Applying --vm-significant-figures=8 to recording rules
0.05055757575781
0.050557576
changed compression ratio from 1.2B to 0.8B per sample
See more at https://medium.com/victoriametrics-how-to-migrate-data-from-prometheus
32. Understanding the data - query tracing
VictoriaMetrics supports query tracing for detecting bottlenecks during query processing.
This is like EXPLAIN ANALYZE from Postgresql!
43. Cardinality explorer: summary
VictoriaMetrics allows exploring time series cardinality to identify:
● Metric names with the highest number of series
● Labels with the highest number of series
● Values with the highest number of series for the selected label
● label=name pairs with the highest number of series
● Labels with the highest number of unique values
* Available built-in in VictoriaMetrics components
* Supports specifying Prometheus URL
44. Streaming aggregation vs Recording rules
The number of time series stored in TSDB
is Data-in + Recording Rules results
45. Streaming aggregation vs Recording rules
The number of time series stored in TSDB
is only what needs to be persisted
46. How to use streaming aggregation
- match: "grpc_server_handled_total" # timeseries selector
interval: "2m" # on 2m interval
outputs: ["total"] # aggregate as counter
without: ["grpc_method"] # group without label
Result:
grpc_server_handled_total:2m_without_grpc_method_total
47. How to use streaming aggregation
https://play.victoriametrics.com
48. Streaming aggregation: summary
1. Aggregate incoming samples in streaming mode before data is written to remote
storage
2. Aggregation is applied to all the metrics received via any supported data
ingestion protocol and/or scraped from Prometheus-compatible targets
3. Statsd alternative
4. Recording rules alternative
5. Reducing the number of stored samples
6. Reducing the number of stored series
7. Compatible with tools supporting Prometheus remote write protocol
53. Complexity penalty
● Complex systems are harder to maintain
● Complex systems are harder to educate about
● Complex systems are more expensive to scale
54. Additional materials
1. Snapshot of Grafana dashboard from the benchmark
2. Benchmark repo for reproducing the test
3. Save network costs with VictoriaMetrics remote write protocol
4. VictoriaMetrics: achieving better compression than Gorilla for time series data
5. Streaming aggregation
6. VictoriaMetrics playground