SlideShare a Scribd company logo
LongTerm
storage for
Prometheus
Far long ago….. in all datacentres
bigger average = less details
CONTENT
 Introduction
 Long-Term Storage Overview
 Thanos Architecture and Resources Usage
 VictoriaMetrics Architecture and Resources Usage
 Price comparison
INTRODUCTION
Thanos is a set of components that can be composed into a highly available metric system with
unlimited storage capacity, which can be added seamlessly on top of existing Prometheus
deployments.
Curren release 0.5.0 is designed to store old metrics (which reached retention period on
Prometheus nodes) on some S3 like storage for long-term.
Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will
show only data stored on Prometheus instances.
VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long-
term remote storage for Prometheus. It uses own data compression, it allows to store more data
on the same disk size.
Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for
Prometheus.
Prometheus
node
Monitored service 1
Monitored service 2
Monitored service ...
Monitored service N
Storage
Grafana
Long-Term Storage
DataSource 1
DataSource 2
Alerts
Alertmanager
Store data after retention is reached
 Why do we need Long-Term storage:
 To store a historical data about your workloads
 To review an incidents
 To plan a scaling based on seasonal load
 To find a bottlenecks into infrastructure during continuous run/load
 What solutions can be used for storing Long-Term historical timeseries:
 Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics *
* https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
LONG-TERM STORAGE OVERVIEW
THANOS ARCHITECTURE
Prometheus POD
Prometheus Prometheus config reloader
Configmap reloader Thanos sidecar
Thanos query POD
Thanos query
Thanos compact
Thanos compact
Thanos Store Gateway POD
Thanos store gateway
* https://thanos.io/getting-started.md/
Storages for Thanos:
(stable)
- Google Cloud Storage
- AWS S3
- Azure Storage Account
(beta)
- OpenStack Swift
- Tencent COS
Thanos query POD
Thanos query
Thanos Store Gateway POD
Thanos store gateway
Prometheus 2
Grafana or Thanos UI
Prometheus 1
Bucket
AVANTAGES AND DISAVANTAGES
- Infinity retention without reconfiguring
srorage
- Collected data is available even if
infrastucture recreated (data is into bucket)
- Global query view over data collected from
multiple Prometheus instances and bucket
- Horizontal scalability
- Metrics compaction
- Full monitoring stack
- Complicated infrastructure
HOW IT WAS TESTED
NODE_0
NODE_2
NODE...
NODE_498
NODE_499
METRIC_0
METRIC_1
METRIC_2
METRIC_...
METRIC_999
NODE_49
9
NODE_
4
500
NODES
1000 METRIC PER
NODE
each 15 seconds
24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
LOAD ON CLUSTER NODES
QUERIES VIA THANOS FROM BUCKET
MEMORY USAGE STABILIZATION ON CLUSTER NODES
SCRAPE DURATION
GKE CLUSTER DETAILS
pay attention on allocation )))
Between scrapes 30 sec, during this time we have 2 15-sec intervals,
So 4.37 sec prometheus needs to scrape 1 000 000 metrics
VICTORIAMETRICS ARCHITECTURE
VM-select_2VM-select_1 VM-select_3
VM-storage_2VM-storage_1 VM-storage_3
VM-insert_2VM-insert_1 VM-insert_3
LB/ClusterIP
LB/ClusterIP
STATEFUL
STATELESS
STATELESS
READ OPERATIONS
WRITE OPERATIONS
AVANTAGES AND DISAVANTAGES
- Infinity retention with reconfiguring storage
- Global query view over data collected from
storage
- Horizontal scalability
- Metrics compaction (multpile times better)
(floating to integer)
- Simple infrastructure
- No integration with Alert Manager
- Cloud storages are not supported yet
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129
- More load on hosts
24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
THANOS VICTORIAMETRICS
- 12-15 GiB metrics per 1 day (2.88Bil)
- 16GiB memory used on nodes
- 2.1 – 2.4 CPU cores are used on nodes
- 2.8-3 GiB metrics per 1 day on each
storage(2.88Bil)
- 16GiB memory used on nodes
- 2.8 – 4 CPU cores are used on nodes
Storage price (Cloud Storage*):
15*365=5475 ~5500Gib
Storage total: $126.50 per month;
~$1500 in 1 year
* Based on retention we can move data to a
cold line storage class
Storage price (Persistent Disk Standard):
3*365=1095 ~1100Gib
$52.80 per month * Numer_of_Storages
Storage total: 52.8*3=158.4 per 1 month
* https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
If one of the storages lost – some part of data became unavailable
PRICE COMPARISON
Thanos Vicrotiametrics
Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate
3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60
CPU usage
Memory usage
50%
16GB
65%
16GB
Metrics per day 15GB 9GB
Metrics per minute 2 000 000 2 000 000
Metrics per one day 2 880 000 000 2 880 000 000
Scrape interval (1M metrics) 4.373 s 4.553 s
Historical data access 303-525 ms
(500 timeseies)
179-492 ms
(500 timeseies)
STORAGE PRICING
Q & A
To produce downsampled data, the Compactor continuously aggregates series down to five
minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR
compression, it stores different types of aggregations, e.g. min, max, or sum in a single block.
This allows Querier to automatically choose the aggregate that is appropriate for a given
PromQL query.
VM Gorilla compression analysis
VM Gorilla compression analysis
VM Gorilla compression analysis
The only problem is the result may exceed 64 bits — default integer size used in modern computers.
How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that
allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.

More Related Content

What's hot

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
NAVER D2
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
JayjeetChakraborty
 
Designing and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentDesigning and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift Deployment
Siheon Kim
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
Gordon Chung
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Igor Sfiligoi
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
Florian Lautenschlager
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
Gordon Chung
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
Gordon Chung
 
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIO
Hugo González Labrador
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
QAware GmbH
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
Jason Terpko
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and present
Gordon Chung
 
Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105
Gruter
 
Cassandra&map reduce
Cassandra&map reduceCassandra&map reduce
Cassandra&map reduce
vlaskinvlad
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
Igor Sfiligoi
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API
akvalex
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
Igor Sfiligoi
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
Igor Sfiligoi
 
NBITSearch. Features.
NBITSearch. Features.NBITSearch. Features.
NBITSearch. Features.
Novosib-BIT LLC
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
Jason Terpko
 

What's hot (20)

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Designing and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentDesigning and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift Deployment
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
 
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIO
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and present
 
Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105
 
Cassandra&map reduce
Cassandra&map reduceCassandra&map reduce
Cassandra&map reduce
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
NBITSearch. Features.
NBITSearch. Features.NBITSearch. Features.
NBITSearch. Features.
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 

Similar to ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
Amazon Web Services
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
ShapeBlue
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Denodo
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Amazon Web Services
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Big Data Spain
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
Databricks
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prep
Jason Wong
 
Data Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexData Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/Cortex
Anant Corporation
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Thomas Riley
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
Alluxio, Inc.
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
Juraj Hantak
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
Adam Hamsik
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
C4Media
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
NETWAYS
 
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
SoftwareONEPresents
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
ZhangZhengming
 
Symantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWSSymantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWS
Amazon Web Services LATAM
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
InfluxData
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Timescale
 
How to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsHow to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage Costs
Buurst
 

Similar to ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019 (20)

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prep
 
Data Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexData Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/Cortex
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
 
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Symantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWSSymantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWS
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
How to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsHow to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage Costs
 

More from UA DevOps Conference

ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOpsІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
UA DevOps Conference
 
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
UA DevOps Conference
 
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
UA DevOps Conference
 
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
UA DevOps Conference
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
UA DevOps Conference
 
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
UA DevOps Conference
 
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
UA DevOps Conference
 
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
UA DevOps Conference
 
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
UA DevOps Conference
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
UA DevOps Conference
 

More from UA DevOps Conference (10)

ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOpsІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
 
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
 
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
 
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
 
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
 
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
 
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
 

Recently uploaded

Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 

Recently uploaded (20)

Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 

ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

  • 2. Far long ago….. in all datacentres bigger average = less details
  • 3. CONTENT  Introduction  Long-Term Storage Overview  Thanos Architecture and Resources Usage  VictoriaMetrics Architecture and Resources Usage  Price comparison
  • 4. INTRODUCTION Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments. Curren release 0.5.0 is designed to store old metrics (which reached retention period on Prometheus nodes) on some S3 like storage for long-term. Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will show only data stored on Prometheus instances. VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long- term remote storage for Prometheus. It uses own data compression, it allows to store more data on the same disk size. Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for Prometheus.
  • 5. Prometheus node Monitored service 1 Monitored service 2 Monitored service ... Monitored service N Storage Grafana Long-Term Storage DataSource 1 DataSource 2 Alerts Alertmanager Store data after retention is reached
  • 6.  Why do we need Long-Term storage:  To store a historical data about your workloads  To review an incidents  To plan a scaling based on seasonal load  To find a bottlenecks into infrastructure during continuous run/load  What solutions can be used for storing Long-Term historical timeseries:  Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics * * https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage LONG-TERM STORAGE OVERVIEW
  • 7. THANOS ARCHITECTURE Prometheus POD Prometheus Prometheus config reloader Configmap reloader Thanos sidecar Thanos query POD Thanos query Thanos compact Thanos compact Thanos Store Gateway POD Thanos store gateway
  • 8. * https://thanos.io/getting-started.md/ Storages for Thanos: (stable) - Google Cloud Storage - AWS S3 - Azure Storage Account (beta) - OpenStack Swift - Tencent COS
  • 9. Thanos query POD Thanos query Thanos Store Gateway POD Thanos store gateway Prometheus 2 Grafana or Thanos UI Prometheus 1 Bucket
  • 10. AVANTAGES AND DISAVANTAGES - Infinity retention without reconfiguring srorage - Collected data is available even if infrastucture recreated (data is into bucket) - Global query view over data collected from multiple Prometheus instances and bucket - Horizontal scalability - Metrics compaction - Full monitoring stack - Complicated infrastructure
  • 11. HOW IT WAS TESTED NODE_0 NODE_2 NODE... NODE_498 NODE_499 METRIC_0 METRIC_1 METRIC_2 METRIC_... METRIC_999 NODE_49 9 NODE_ 4 500 NODES 1000 METRIC PER NODE each 15 seconds
  • 12. 24 Hours Scroll Bar (500 reporters) 500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
  • 14.
  • 15. QUERIES VIA THANOS FROM BUCKET
  • 16.
  • 17. MEMORY USAGE STABILIZATION ON CLUSTER NODES SCRAPE DURATION GKE CLUSTER DETAILS pay attention on allocation ))) Between scrapes 30 sec, during this time we have 2 15-sec intervals, So 4.37 sec prometheus needs to scrape 1 000 000 metrics
  • 18. VICTORIAMETRICS ARCHITECTURE VM-select_2VM-select_1 VM-select_3 VM-storage_2VM-storage_1 VM-storage_3 VM-insert_2VM-insert_1 VM-insert_3 LB/ClusterIP LB/ClusterIP STATEFUL STATELESS STATELESS READ OPERATIONS WRITE OPERATIONS
  • 19.
  • 20.
  • 21. AVANTAGES AND DISAVANTAGES - Infinity retention with reconfiguring storage - Global query view over data collected from storage - Horizontal scalability - Metrics compaction (multpile times better) (floating to integer) - Simple infrastructure - No integration with Alert Manager - Cloud storages are not supported yet https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129 - More load on hosts
  • 22.
  • 23. 24 Hours Scroll Bar (500 reporters) 500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
  • 24.
  • 25.
  • 26. THANOS VICTORIAMETRICS - 12-15 GiB metrics per 1 day (2.88Bil) - 16GiB memory used on nodes - 2.1 – 2.4 CPU cores are used on nodes - 2.8-3 GiB metrics per 1 day on each storage(2.88Bil) - 16GiB memory used on nodes - 2.8 – 4 CPU cores are used on nodes Storage price (Cloud Storage*): 15*365=5475 ~5500Gib Storage total: $126.50 per month; ~$1500 in 1 year * Based on retention we can move data to a cold line storage class Storage price (Persistent Disk Standard): 3*365=1095 ~1100Gib $52.80 per month * Numer_of_Storages Storage total: 52.8*3=158.4 per 1 month * https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134 If one of the storages lost – some part of data became unavailable PRICE COMPARISON
  • 27. Thanos Vicrotiametrics Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate 3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60 CPU usage Memory usage 50% 16GB 65% 16GB Metrics per day 15GB 9GB Metrics per minute 2 000 000 2 000 000 Metrics per one day 2 880 000 000 2 880 000 000 Scrape interval (1M metrics) 4.373 s 4.553 s Historical data access 303-525 ms (500 timeseies) 179-492 ms (500 timeseies)
  • 29. Q & A
  • 30.
  • 31. To produce downsampled data, the Compactor continuously aggregates series down to five minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR compression, it stores different types of aggregations, e.g. min, max, or sum in a single block. This allows Querier to automatically choose the aggregate that is appropriate for a given PromQL query.
  • 34. VM Gorilla compression analysis The only problem is the result may exceed 64 bits — default integer size used in modern computers. How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.