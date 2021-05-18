Successfully reported this slideshow.
Version 1.0 Prometheus at Scale : Thanos / Cortex / etc. An Anant Corporation Story. How Prometheus scales in global busin...
Prometheus (recap) ● Multidimensional data model over time via metric name, and key/value pairs ● PromQL, now standard que...
Prometheus (on Kubernetes)
Prometheus at Scale ● Cortex ● Thanos ● M3DB (from Uber) ● Victoria Metrics ● Vulcan (from Digital Ocean) https://sysdig.c...
Prometheus at Scale Needs ● Global View - Queries over multiple promethei ● Multi-Replica / High Availability - No downtim...
Cortex ● Global View - Centralized data ● Multi-Replica / High Availability - Dedupe at write ● Long Term Storage - NoSQL ...
Cortex ●
Cortex ●
Cortex ●
Thanos ● Global View - Federated Data / Fan out queries ● Multi-Replica / High Availability - Query time dedupe ● Long Ter...
Thanos
Thanos - Basic Architecture ●
Thanos Architecture
Thanos ●
Thanos ●
Thanos / Cortex Together
Resources ● Thanos - Scalable Prometheus (https://www.infoq.com/news/2018/06/thanos-scalable-prometheus ) ● Cortex Archite...
Strategy: Scalable Fast Data Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, C...
Data & Analytics
May. 18, 2021

Data Engineer's Lunch #23: Thanos/Cortex

  1. 1. Version 1.0 Prometheus at Scale : Thanos / Cortex / etc. An Anant Corporation Story. How Prometheus scales in global business platforms
  2. 2. Prometheus (recap) ● Multidimensional data model over time via metric name, and key/value pairs ● PromQL, now standard query language ● Time series collection via pull or push (via gateway) ● Dynamic service discovery or via static conﬁguration ● Separation of concerns in graphing / dashboarding
  3. 3. Prometheus (on Kubernetes)
  4. 4. Prometheus at Scale ● Cortex ● Thanos ● M3DB (from Uber) ● Victoria Metrics ● Vulcan (from Digital Ocean) https://sysdig.com/blog/challenges-scale-prometheus/
  5. 5. Prometheus at Scale Needs ● Global View - Queries over multiple promethei ● Multi-Replica / High Availability - No downtime, no data loss ● Long Term Storage - Store data in cold storage for future ● Global Scale - Millions of containers / pods / vms ● Community Support - Many people using it ● Community Knowledge Online - Many people documenting
  6. 6. Cortex ● Global View - Centralized data ● Multi-Replica / High Availability - Dedupe at write ● Long Term Storage - NoSQL Index + Chunks ○ Index (Cassandra / DynamoDB/ BigTable) ○ Chunk (Cassandra / DynamoDB/ BigTable/S3 / GCS/Azure)
  7. 7. Cortex ●
  8. 8. Cortex ●
  9. 9. Cortex ●
  10. 10. Thanos ● Global View - Federated Data / Fan out queries ● Multi-Replica / High Availability - Query time dedupe ● Long Term Storage - TSDB blocks in object store ○ GCS ○ S3 Compatible (Ceph/ Minio ○ Azure Blob Storage ○ ….
  11. 11. Thanos
  12. 12. Thanos - Basic Architecture ●
  13. 13. Thanos Architecture
  14. 14. Thanos ●
  15. 15. Thanos ●
  16. 16. Thanos / Cortex Together
  17. 17. Resources ● Thanos - Scalable Prometheus (https://www.infoq.com/news/2018/06/thanos-scalable-prometheus ) ● Cortex Architecture (https://cortexmetrics.io/docs/architecture/) ● Thanos (https://thanos.io/) ● Challenges of Prometheus at Scale (https://sysdig.com/blog/challenges-scale-prometheus/) ● Tutorial : Prometheus at Scale (https://epsagon.com/tools/thanos-tutorial-prometheus-at-scale/) ● Github / Cortex (https://github.com/cortexproject/cortex)
×