ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

LongTerm
storage for
Prometheus

Far long ago….. in all datacentres
bigger average = less details

CONTENT
 Introduction
 Long-Term Storage Overview
 Thanos Architecture and Resources Usage
 VictoriaMetrics Architecture and Resources Usage
 Price comparison

INTRODUCTION
Thanos is a set of components that can be composed into a highly available metric system with
unlimited storage capacity, which can be added seamlessly on top of existing Prometheus
deployments.
Curren release 0.5.0 is designed to store old metrics (which reached retention period on
Prometheus nodes) on some S3 like storage for long-term.
Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will
show only data stored on Prometheus instances.
VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long-
term remote storage for Prometheus. It uses own data compression, it allows to store more data
on the same disk size.
Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for
Prometheus.

Prometheus
node
Monitored service 1
Monitored service 2
Monitored service ...
Monitored service N
Storage
Grafana
Long-Term Storage
DataSource 1
DataSource 2
Alerts
Alertmanager
Store data after retention is reached

 Why do we need Long-Term storage:
 To store a historical data about your workloads
 To review an incidents
 To plan a scaling based on seasonal load
 To find a bottlenecks into infrastructure during continuous run/load
 What solutions can be used for storing Long-Term historical timeseries:
 Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics *
* https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
LONG-TERM STORAGE OVERVIEW

THANOS ARCHITECTURE
Prometheus POD
Prometheus Prometheus config reloader
Configmap reloader Thanos sidecar
Thanos query POD
Thanos query
Thanos compact
Thanos compact
Thanos Store Gateway POD
Thanos store gateway

* https://thanos.io/getting-started.md/
Storages for Thanos:
(stable)
- Google Cloud Storage
- AWS S3
- Azure Storage Account
(beta)
- OpenStack Swift
- Tencent COS

Thanos query POD
Thanos query
Thanos Store Gateway POD
Thanos store gateway
Prometheus 2
Grafana or Thanos UI
Prometheus 1
Bucket

AVANTAGES AND DISAVANTAGES
- Infinity retention without reconfiguring
srorage
- Collected data is available even if
infrastucture recreated (data is into bucket)
- Global query view over data collected from
multiple Prometheus instances and bucket
- Horizontal scalability
- Metrics compaction
- Full monitoring stack
- Complicated infrastructure

HOW IT WAS TESTED
NODE_0
NODE_2
NODE...
NODE_498
NODE_499
METRIC_0
METRIC_1
METRIC_2
METRIC_...
METRIC_999
NODE_49
9
NODE_
4
500
NODES
1000 METRIC PER
NODE
each 15 seconds

24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points

QUERIES VIA THANOS FROM BUCKET

MEMORY USAGE STABILIZATION ON CLUSTER NODES
SCRAPE DURATION
GKE CLUSTER DETAILS
pay attention on allocation )))
Between scrapes 30 sec, during this time we have 2 15-sec intervals,
So 4.37 sec prometheus needs to scrape 1 000 000 metrics

VICTORIAMETRICS ARCHITECTURE
VM-select_2VM-select_1 VM-select_3
VM-storage_2VM-storage_1 VM-storage_3
VM-insert_2VM-insert_1 VM-insert_3
LB/ClusterIP
LB/ClusterIP
STATEFUL
STATELESS
STATELESS
READ OPERATIONS
WRITE OPERATIONS

AVANTAGES AND DISAVANTAGES
- Infinity retention with reconfiguring storage
- Global query view over data collected from
storage
- Horizontal scalability
- Metrics compaction (multpile times better)
(floating to integer)
- Simple infrastructure
- No integration with Alert Manager
- Cloud storages are not supported yet
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129
- More load on hosts

THANOS VICTORIAMETRICS
- 12-15 GiB metrics per 1 day (2.88Bil)
- 16GiB memory used on nodes
- 2.1 – 2.4 CPU cores are used on nodes
- 2.8-3 GiB metrics per 1 day on each
storage(2.88Bil)
- 16GiB memory used on nodes
- 2.8 – 4 CPU cores are used on nodes
Storage price (Cloud Storage*):
15*365=5475 ~5500Gib
Storage total: $126.50 per month;
~$1500 in 1 year
* Based on retention we can move data to a
cold line storage class
Storage price (Persistent Disk Standard):
3*365=1095 ~1100Gib
$52.80 per month * Numer_of_Storages
Storage total: 52.8*3=158.4 per 1 month
* https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
If one of the storages lost – some part of data became unavailable
PRICE COMPARISON

Thanos Vicrotiametrics
Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate
3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60
CPU usage
Memory usage
50%
16GB
65%
16GB
Metrics per day 15GB 9GB
Metrics per minute 2 000 000 2 000 000
Metrics per one day 2 880 000 000 2 880 000 000
Scrape interval (1M metrics) 4.373 s 4.553 s
Historical data access 303-525 ms
(500 timeseies)
179-492 ms
(500 timeseies)

To produce downsampled data, the Compactor continuously aggregates series down to five
minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR
compression, it stores different types of aggregations, e.g. min, max, or sum in a single block.
This allows Querier to automatically choose the aggregate that is appropriate for a given
PromQL query.

VM Gorilla compression analysis

VM Gorilla compression analysis
The only problem is the result may exceed 64 bits — default integer size used in modern computers.
How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that
allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.

ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

Similar to ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019 (20)

More from UA DevOps Conference

More from UA DevOps Conference (10)

Recently uploaded

Recently uploaded (20)

ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019