Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Thanos: Global, durable Prometheus monitoring

6,126 views

Published on

Prometheus’s simple and reliable operational model is one of its major selling points. However, after surpassing a certain scale, we have identified a few shortcomings it imposes. We are proud to present Thanos, an open source project by Improbable that bundles a set of components that seamlessly transform existing Prometheus deployments, into a unified, global scale monitoring system.

Authors: Fabian Reinartz, Bartlomiej Plotka

Slides from January London Prometheus Meetup 2018.
Thanos: https://github.com/improbable-eng/thanos

Published in: Technology

Thanos: Global, durable Prometheus monitoring

  1. 1. Bartek Plotka Bwplotka Bplotka Fabian Reinartz fabxc Global, durable Prometheus monitoring
  2. 2. Prometheus 2.0 ● Reliable operational model ● Powerful query language ● Scraping capabilities beyond the casual usage ● Local metric storage
  3. 3. Prometheus at Scale A dream
  4. 4. n1 Improbable case 2 n+1 ... ● Multiple isolated Kubernetes clusters
  5. 5. n1 Improbable case 2 Prometheus n+1 Prometheus ... ● Multiple isolated Kubernetes clusters ● Single Prometheus server per cluster
  6. 6. 1 Improbable case 2 Prometheus n n+1 Prometheus ... ● Multiple isolated Kubernetes clusters ● Single Prometheus server per cluster ● Dashboards & Alertmanager in separate cluster Grafana Alertmanager
  7. 7. Improbable case ● Multiple isolated Kubernetes clusters ● Single Prometheus server per cluster ● Dashboards & Alertmanager in separate cluster Grafana Alertmanager What is missing? 1 2 Prometheus n n+1 Prometheus ...
  8. 8. Global View See everything from a single place!
  9. 9. 1 Global View 2 Prometheus n n+1 Prometheus ... Grafana Alertmanager “Alert when 66% of clusters in a region are down”
  10. 10. 1 Global View 2 Prometheus n n+1 Prometheus ... ● How to aggregate data from different clusters? Grafana Alertmanager
  11. 11. 1 Global View 2 Prometheus n n+1 Prometheus ... ● How to aggregate data from different clusters? ○ Use hierarchical federation? Grafana Alertmanager Prometheus
  12. 12. 1 Global View 2 Prometheus n n+1 Prometheus ... ● How to aggregate data from different clusters? ○ Use hierarchical federation? Grafana AlertmanagerSingle point of failure Maintenance What data are federated? Prometheus
  13. 13. Availability Where is my sample?
  14. 14. 1 Availability 2 Prometheus n n+1 Prometheus ... Grafana Alertmanager Operator error Hardware failure Rollout
  15. 15. 1 Availability 2 Prometheus n n+1 Prometheus ... ● Can we assure no loss in metric data? Grafana Alertmanager
  16. 16. 1 Availability 2 Prometheus n n+1 Prometheus ... ● Can we assure no loss in metric data? ○ Add HA replicas? Grafana Alertmanager Prometheus Prometheus
  17. 17. 1 Availability 2 Prometheus n n+1 Prometheus ... ● Can we assure no loss in metric data? ○ Add HA replicas? Grafana Alertmanager Prometheus Prometheus What replica we should query? Cost? Where to put rules and alerts?
  18. 18. Historical Metrics What exactly happened X weeks ago?
  19. 19. Metric retention T - 3 years TT - 12 monthsT - 2 years ? “Go back to what happened 6 months ago...”
  20. 20. Metric retention T - 3 years TT - 12 monthsT - 2 years ? “Good progress! Memory allocs looks better than 1,5 year ago!” T X
  21. 21. Metric retention “Let’s see user traffic across years!” T - 2 years T - 12 monthsT - 3 years ? ? ? ? T X
  22. 22. Metric retention Infrastructure retention: 9 days 9 days T - 3 years TT - 12 monthsT - 2 years T
  23. 23. Metric retention ● Can we have longer retention?
  24. 24. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0
  25. 25. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus
  26. 26. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus
  27. 27. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus
  28. 28. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus SSD SSD SSD SSD
  29. 29. Metric retention ● Can we have longer retention? ○ Upgrade to Prometheus 2.0 ○ Scale SSD Vertically? SSD Prometheus Backup? Maintenance Cost
  30. 30. Recap 1 2 Prometheus n n+1 Prometheus ... Grafana Alertmanager Prometheus It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  31. 31. Thanos It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  32. 32. ● Seamless integration with Prometheus ● Easy deployment model ● Minimal number of dependencies ● Minimal baseline cost Additional Goals
  33. 33. Global View See everything from a single place!
  34. 34. SSD Prometheus Prometheus Targets
  35. 35. SSD Sidecar Prometheus Sidecar Targets
  36. 36. SSD Sidecar Prometheus Sidecar Targets gRPC (Store API)
  37. 37. Store API service Store { rpc Series(SeriesRequest) returns (stream SeriesResponse); rpc LabelNames(LabelNamesRequest) returns (LabelNamesResponse); rpc LabelValues(LabelValuesRequest) returns (LabelValuesResponse); } message SeriesRequest { int64 min_time = 1; int64 max_time = 2; repeated LabelMatcher matchers = 3; } Sidecar Prometheus remote read Store API
  38. 38. SSD Querier Prometheus Sidecar Querier Store API Targets HTTP Query API
  39. 39. SSD Global View Prometheus Sidecar Querier Targets SSD Sidecar Targets Prometheus Merge Store API
  40. 40. SSD Global View + Availability Prometheus Sidecar Targets SSD Sidecar Targets Prometheus SSD Sidecar Prometheus “replica”:”1” “replica”:”2” Querier Merge Deduplicate Store API
  41. 41. Thanos It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  42. 42. Historical Metrics What exactly happened X weeks ago?
  43. 43. TSDB Layout Block 2 Block 4Block 3Block 1 T-200T-300 T-100 T-50 T
  44. 44. TSDB Layout Block 4Block 3Block 1 chunks chunks chunks chunks index T-200T-300 T-100 T-50 T
  45. 45. SSD Data saving Prometheus Sidecar Targets Object Storage Blocks Blocks Block
  46. 46. Store Object Storage Blocks Cache Store Querier Store API
  47. 47. Store Object Storage Blocks Cache Store Querier Block Store API
  48. 48. Store ● A series is made up of one or more “chunks” ● A chunk contains ~120 samples each ● Chunks can be retrieved through HTTP byte range queries
  49. 49. Store ● A series is made up of one or more “chunks” ● A chunk contains ~120 samples each ● Chunks can be retrieved through HTTP byte range queries Example: ● 1000 series @ 30s scrape interval
  50. 50. Store ● A series is made up of one or more “Chunks” ● A chunk contains ~120 samples each ● Chunks can be retrieved through HTTP byte range queries Example: ● 1000 series @ 30s scrape interval ● Query 1 year 8.7 million chunks/range queries
  51. 51. Store Leverage Prometheus’ TSDB file layout
  52. 52. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned sequentially
  53. 53. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned ● Similar series are aligned, e.g. due to same metric name
  54. 54. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned ● Similar series are aligned, e.g. due to same metric name Consolidating ranges in close proximity reduces request count by 4-6 orders of magnitude. 8.7 million requests turned into O(20) requests.
  55. 55. Store Leverage Prometheus’ TSDB file layout ● Chunks of the same series are aligned ● Similar series are aligned, e.g. due to same metric name Index lookups profit from a similar approach.
  56. 56. Compaction Density matters
  57. 57. Compaction Object Storage Blocks Disk Compactor
  58. 58. Compaction Object Storage Blocks Disk Compactor Blocks
  59. 59. Compaction Object Storage Blocks Disk Compactor Blocks Block
  60. 60. Compaction Object Storage Blocks Disk Compactor Block
  61. 61. Thanos It is just hard to… ● Have a global view ● Have a HA in place ● Increase retention
  62. 62. Downsampling Let’s just step back a little
  63. 63. Downsampling Raw: 16 bytes/sample Compressed: 1.07 bytes/sample
  64. 64. Downsampling BUT…
  65. 65. Downsampling Decompressing one sample takes 10-40 nanoseconds ● Times 1000 series @ 30s scrape interval ● Times 1 year
  66. 66. Downsampling Decompressing one sample takes 10-40 nanoseconds ● Times 1000 series @ 30s scrape interval ● Times 1 year ● Over 1 billion samples, i.e. 10-40s – for decoding alone ● Plus your actual computation over all those samples, e.g. rate()
  67. 67. Downsampling Block RAW Block @ 5m Block @ 1h 10x 12x
  68. 68. Downsampling raw chunk count sum min max counter raw chunk...
  69. 69. Downsampling count sum min max counter ...
  70. 70. Downsampling count sum min max counter count_over_time(requests_total[1h])
  71. 71. Downsampling count sum min max counter sum_over_time(requests_total[1h])
  72. 72. Downsampling count sum min max counter min(requests_total) min_over_time(requests_total[1h])
  73. 73. Downsampling count sum min max counter max(requests_total) max_over_time(requests_total[1h])
  74. 74. Downsampling count sum min max counter rate(requests_total[1h]) increase(requests_total[1h])
  75. 75. Downsampling count sum min max counter requests_total avg(requests_total) ... * avg
  76. 76. Full Architecture Querier SSD Sidecar Prometheus SSD Sidecar Prometheus QuerierQuerier … Compactor Store Bucket
  77. 77. Full Architecture $ thanos sidecar … $ thanos query … $ thanos store … $ thanos compact …
  78. 78. Deployment Models Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Cluster A Cluster B Cluster C
  79. 79. Deployment Models Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Querier S P QuerierQuerier … Store Bucket S P Cluster A Cluster B Cluster C Federation (through Store API)
  80. 80. Deployment Models Querier S P QuerierQuerier … Store Bucket S P S P … Store Bucket S P S P … Store Bucket S P Cluster A Cluster B Cluster C Global Scale Thanos Cluster
  81. 81. Cost ● Store + Query node ~ Savings on Prometheus side (+/- 0) ● Fewer SSD space on Prometheus side (savings) ● Basically: just your data stored in S3/GCS/HDFS + requests
  82. 82. Cost Example: ● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk ● 20 x 250GB of new data per month + ~20% overhead for downsampling ● $1440/month for storage after 1 year (72TB of queryable data) ● $100/month for sustained 100 query/sec against object storage Thanos Cost: $1540
  83. 83. Cost Example: ● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk ● 20 x 250GB of new data per month + ~20% overhead for downsampling ● $1440/month for storage after 1 year (72TB of queryable data) ● $100/month for sustained 100 query/sec against object storage ● $1530/month savings in local SSDs Thanos Cost: $1540 Prometheus Savings: $1530
  84. 84. Demo - retention
  85. 85. Demo - deduplication
  86. 86. Demo - deduplication
  87. 87. Any questions? github.com/improbable-eng/thanos Fabian Reinartz fabxc Bartek Plotka bwplotka Bplotka

×