Successfully reported this slideshow.
Your SlideShare is downloading. ×

Prometheus and Thanos

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 22 Ad

More Related Content

Slideshows for you (20)

Similar to Prometheus and Thanos (20)

Advertisement

More from CloudOps2005 (20)

Recently uploaded (20)

Advertisement

Prometheus and Thanos

  1. 1. Ticketmaster confidential. Do not distribute. Ticketmaster confidential. Do not distribute. Prometheus + Thanos Thanos long term storage for Prometheus
  2. 2. Ticketmaster confidential. Do not distribute. About me... • Rocking it at Ticketmaster • Student in Master of Business Administration (MBA) Strategic project management • Bachelor of Applied Science (B.A.Sc.) in Computer Science • Devops, Cloud, Kubernetes, Kafka, etc. • Sport lover (Hockey, Dek Hockey, Ultimate Frisbee, Football…) 2 SECTION
  3. 3. Ticketmaster confidential. Do not distribute. Where Prometheus & Thanos are in the CNCF landscape... 3 Prometheus
  4. 4. Ticketmaster confidential. Do not distribute. Insert photography and crop to size of grey box. (24.25 in. x 7.21 in.) What is Prometheus? 4 Prometheus Prometheus was the second project to be graduated of the Cloud Native Computing Foundation.
  5. 5. Ticketmaster confidential. Do not distribute. Prometheus Architecture
  6. 6. Ticketmaster confidential. Do not distribute. How we are using Prometheus at Ticketmaster? • We are using the Prometheus-Operator made by CoreOS (RedHat (IBM)) • We have created a Helm chart that created common pulling jobs, settings and exporters • Scrape EC2 instances based on Tags, Kubernetes • Exporters like cloudwatch-metrics, kafka, blackbox • Ingresses • Thanos • Federations 6 Prometheus
  7. 7. Ticketmaster confidential. Do not distribute. Why moving away from the federation? • Calculate the right ingestion rate of our scrape job is not really easy • When the disk is full • Prometheus stop working • We needed to delete the PVC to recreated it bigger. • Long term storage is costly SSD == $$$$ • Single point of failure • Availability • Operator error • Hardware failure • Rollout 7 Prometheus
  8. 8. Ticketmaster confidential. Do not distribute. Solution
  9. 9. Ticketmaster confidential. Do not distribute. Goals 9 Thanos - Easy Deployment model - Minimal number of dependencies - Minimal baseline cost Have a global view Seamless integration with Prometheus Increase retentionHave a HA in place
  10. 10. Ticketmaster confidential. Do not distribute. Global view 10 Thanos
  11. 11. Ticketmaster confidential. Do not distribute. Global view + HA 11 Thanos
  12. 12. Ticketmaster confidential. Do not distribute. Increase retention (Persist data) 12 Thanos
  13. 13. Ticketmaster confidential. Do not distribute. Increase retention (Querying) • A series is made up of one or more “chunks” • A chunk contains ~120 samples each • Chunks can be retrieved through HTTP byte range queries Example: • 1000 series @ 30s scrape interval • Query 1 year • 8.7 million chunks/range queries • Chunks of the same series are aligned • Similar series are aligned due to same metric name This reduce request count by 4=6 orders of magnitude. 8.7 million requests turned into O(20) requests • 13 Thanos
  14. 14. Ticketmaster confidential. Do not distribute. Increase retention (Querying) 14 Thanos group chunk same series group by same metric name
  15. 15. Ticketmaster confidential. Do not distribute. Compaction 15 Thanos
  16. 16. Ticketmaster confidential. Do not distribute. Full Architecture 16 Thanos
  17. 17. Ticketmaster confidential. Do not distribute. Deployment Model( Example) • Federation through Store API 17 Thanos
  18. 18. Ticketmaster confidential. Do not distribute. Cost • Store + Query node + Compaction ~ Savings on Prometheus side (+/- 0) • Fewer SSD space on Prometheus side (Savings) • Basically we are only paying for your data stored in S3/GCS/etc + requests 18 Thanos
  19. 19. Ticketmaster confidential. Do not distribute. Cortex 19 Alternatives
  20. 20. Ticketmaster confidential. Do not distribute. Cortex - Reason that we didn’t choose Cortex • Documentation is not really good • Need to maintain another dataset (NoSQL DB) • Interfere with the datapath 20 Alternatives
  21. 21. Ticketmaster confidential. Do not distribute. Question?
  22. 22. Ticketmaster confidential. Do not distribute.

×