Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Storing 16 Bytes at Scale

2,177 views

Published on

An overview of the new time series storage engine powering Prometheus 2.0

Published in: Technology
  • Be the first to comment

Storing 16 Bytes at Scale

  1. 1. Storing 16 Bytes at Scale Fabian Reinartz Software Engineer @fabxc
  2. 2. Time Series time
  3. 3. Time Series time series
  4. 4. requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.3:80"} requests_total{path="/", method="GET", instance="10.0.0.2:80"} ... Time Series
  5. 5. requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.3:80"} requests_total{path="/", method="GET", instance="10.0.0.2:80"} Select: requests_total Time Series
  6. 6. requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.3:80"} requests_total{path="/", method="GET", instance="10.0.0.2:80"} Select: requests_total{method="GET"} Time Series
  7. 7. Time Series time series
  8. 8. Time Series time series
  9. 9. Time Series time series
  10. 10. Time Series time series
  11. 11. Order of Scale 5 million active time series 30 second scrape interval 1 month of retention 166,000 samples/second 432 billion samples 2,000 - 15,000 targets
  12. 12. Order of Scale 5 million active time series 30 second scrape interval 1 month of retention 166,000 samples/second 432 billion samples 8 byte timestamp + 8 byte value ⇒ 7 TB on disk 2,000 - 15,000 targets seems OK
  13. 13. Order of Scale 5 million active time series 30 second scrape interval 6 month of retention 166,000 samples/second 2592 billion samples 8 byte timestamp + 8 byte value ⇒ 42 TB on disk 2,000 - 15,000 targets
  14. 14. Sample Compression – Timestamps 1496163646 1496163676 1496163706 1496163735 1496163765
  15. 15. Sample Compression – Timestamps 1496163646 1496163676 1496163706 1496163735 1496163765 1496163646 +30 +30 +29 +30Δ
  16. 16. Sample Compression – Timestamps 1496163646 1496163676 1496163706 1496163735 1496163765 1496163646 +30 +30 +29 +30 1496163646 +30 +0 -1 +1 Δ ΔΔ
  17. 17. Sample Compression – Values Source: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
  18. 18. Sample Compression Raw: 16 bytes/sample Compressed: 1.37 bytes/sample @beorn7 @dgryski
  19. 19. Order of Scale 5 million active time series 30 second scrape interval 1 month of retention 166,000 samples/second 432 billion samples 8 byte timestamp + 8 byte value ⇒ 7 TB on disk 0.8 TB 2,000 - 15,000 targets
  20. 20. Storing 16 Bytes at Scale Fabian Reinartz Software Engineer, CoreOS @fabxc 1.37
  21. 21. Time Series – Chunks time series 1 file/series filled with chunks
  22. 22. Time Series – Chunks time series 1 file/series filled with chunks
  23. 23. Churn time series
  24. 24. Churn time series
  25. 25. Order of Scale – with Churn 5 million active time series 150 million total time series 30 second scrape interval 1 month of retention 166,000 samples/second 432 billion samples
  26. 26. Churn time series
  27. 27. Blocks time series
  28. 28. Blocks time series
  29. 29. Blocks time series
  30. 30. Blocks data ├── 01BX6V6H5JQN8RWVXM5ABYXY9N │ ├── chunks │ │ ├── 000001 │ │ ├── 000002 │ │ └── 000003 │ ├── index │ ├── meta.json │ └── tombstones ├── 01BX6V6TY06G5MFQ0GPH7EMXRH │ ├── chunks │ │ └── 000001 │ ├── index │ ├── meta.json │ └── tombstones └── wal ├── 000004 ├── 000005 └── 000006
  31. 31. Compaction time series
  32. 32. Compaction time series
  33. 33. Retention time series
  34. 34. Retention time series
  35. 35. Tree Block 1 chunk 1 chunk 2 chunk 3 time Block 2 Block 3 Block 4 Block 5
  36. 36. Tree Block 1 chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 time Block 1 Block 2 Block 2 Block 3
  37. 37. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, }
  38. 38. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● In each block, a series has a unique ID ID: 5
  39. 39. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ID: 5 ... status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... ...
  40. 40. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  41. 41. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  42. 42. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  43. 43. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  44. 44. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  45. 45. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  46. 46. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  47. 47. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  48. 48. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  49. 49. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  50. 50. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5 1502 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  51. 51. Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5 1502 ● In each block, a series has a unique ID ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations (merge, intersect)
  52. 52. Benchmarks
  53. 53. Benchmarks ● Kubernetes cluster + dedicated Prometheus nodes ● 800 microservice instances + Kubernetes components ● 120,000 samples/second ● 300,000 active time series ● Swap out 50% of pods every 10 minutes
  54. 54. Benchmarks (memory)
  55. 55. Benchmarks (CPU)
  56. 56. Benchmarks (disk writes)
  57. 57. Benchmarks (on-disk size)
  58. 58. Benchmarks (query latency)
  59. 59. Try it out! We’re hiring: coreos.com/careers
  60. 60. ● https://fabxc.org/blog/2017-04-10-writing-a-tsdb/ ● https://github.com/prometheus/tsdb More

×