Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Structures for High Resolution, Real-time Telemetry at Scale

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 21 Ad

Data Structures for High Resolution, Real-time Telemetry at Scale

Download to read offline

The challenge within telemetry in real-time systems is that you need as many sources of telemetry as possible (Throughput, latency, Errors, CPU, and many more... ) but you can't pay for extra overhead when our users are expecting sub-ms ops that scale to millions of transactions per second.
In this talk, we'll describe how we're using and improving several OSS data structures to incorporate telemetry features at scale, and showcase why they do matter on scenarios in which we have Performance/Security/Ops issues.

The challenge within telemetry in real-time systems is that you need as many sources of telemetry as possible (Throughput, latency, Errors, CPU, and many more... ) but you can't pay for extra overhead when our users are expecting sub-ms ops that scale to millions of transactions per second.
In this talk, we'll describe how we're using and improving several OSS data structures to incorporate telemetry features at scale, and showcase why they do matter on scenarios in which we have Performance/Security/Ops issues.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Data Structures for High Resolution, Real-time Telemetry at Scale (20)

Advertisement

More from ScyllaDB (20)

Recently uploaded (20)

Advertisement

Data Structures for High Resolution, Real-time Telemetry at Scale

  1. 1. Brought to you by Data Structures for High Resolution, Real-time Telemetry at Scale Filipe Oliveira Performance Engineer at
  2. 2. <whoami> Filipe Oliveira ■ Portuguese ■ Performance Engineer @ Redis Inc ■ Improve/develop OS performance/observability tools ● Proud maintainer of HdrHistogram/hdrhistogram-go and RedisBloom/t-digest-c ■ Check my blog at: https://codeperf.io/
  3. 3. > today ■ takeaway ■ why we need approximate histograms ■ benefits of sketch processing of big data ■ the redis wishlist
  4. 4. #1 telemetry: a single number is not enough
  5. 5. #2 if approximate results are acceptable use them...
  6. 6. taking a step back
  7. 7. > why we need approximate histograms 1. primarily used in monitoring(observability) and data exploration software 2. data as a distribution vs simpler statistics such as a mean or extreme values 3. classic example: System response latency
  8. 8. [1] https://www.brendangregg.com/FrequencyTrails/modes.html [2] https://www.youtube.com/watch?v=mjHam20dmW8 1) Node.js HTTP server response time (latency) from 50 random production servers, showing around 10,000 responses each. 2) MySQL command latency from 50 random production servers, showing around 10,000 commands for each distribution. > the good, the bad, and the ugly average?
  9. 9. > different colors and flavours... ● t-digest, hdrhistogram, d-digest, “raw”, a lot of different sketches… > precision, space, speed, grouping?... ● general purpose ■ t-digest ● prior info about the distribution of data? ■ hdrhistogram
  10. 10. > precision vs implementation ● general purpose: ■ t-digest ● prior info about the distribution of data? ■ hdrhistogram
  11. 11. > big win #1: space 1 day observations with ms precision Type Details In Memory Usage KB Exact array size=86.4M e. 330KB HdrHistogram (2 digits precision) 31KB T-Digest (100 compression) 20KB [1] https://github.com/filipecosta90/quantile-and-histogram-benchmarks
  12. 12. > big win #2: real time data ingestion Ingestion speed by histogram type Type Details ns per op Giga-FLOPS per sec per core HdrHistogram (2 digits precision) 5.38 ns 11.62 GFLOPs/sec T-Digest (100 compression) 67.49ns 5.29 GFLOPs/sec Continuous stream of data SYNC ASYNC
  13. 13. > big win #3: query speed Quantile calculation speed by histogram Type Details ns per op Giga-FLOPS per sec per core HdrHistogram (2 digits precision) 16073 ns 10.53 GFLOPs T-Digest (100 compression) 54.80 ns 9.4 GFLOPs
  14. 14. > big win #4: real time mergeability Type Details ns per op Giga-FLOPS per sec per core HdrHistogram (2 digits precision) 60.6 ns 11.84 GFLOPs/sec T-Digest (100 compression) 2200 ns 10.83 GFLOPs/sec
  15. 15. towards a standard exposition format
  16. 16. towards a standard exposition format referendum on histogram format at #opentelemetry-specification/issues/1776 ■ base-2 exponential histogram protocol support ■ curious about what OpenTelemetry is? check out their website for an explanation!
  17. 17. the Redis wishlist
  18. 18. applying in the Redis ecosystem ■ db level: ● RedisBloom T-Digest MR ● share feedback on Redis discord ■ client level: ● redis-benchmark ● memtier_benchmark ● redis-benchmark-go, tsbs, ftsb, aibench….
  19. 19. the Redis wishlist ■ high-speed ingestion ● microsecond level ■ low memory footprint ■ fully mergeable ■ high-speed querying ● microsecond level ■ text format protocol
  20. 20. let’s discuss it... references... ● how not to measure latency - gil tene ● frequency trails: modes and modality - brendan gregg ● Metrics, Metrics, Everywhere - Coda Hale ● lies, damn lies, and metrics - andré arko ● most page loads will experience the 99%'lie server response - gil tene ● if you are not measuring and/or plotting max, what are you hiding (from)? - gil tene ● latency heat maps - brendan gregg ● visualizing system latency - brendan gregg ● t-digest - ted dunning ● hdrhistogram: a high dynamic range histogram ● Prometheus Histograms – Past, Present, and Future
  21. 21. Brought to you by Filipe Oliveira filipe@redis.com / performance@redis.com @fcosta_oliveira

×