The challenge within telemetry in real-time systems is that you need as many sources of telemetry as possible (Throughput, latency, Errors, CPU, and many more... ) but you can't pay for extra overhead when our users are expecting sub-ms ops that scale to millions of transactions per second.
In this talk, we'll describe how we're using and improving several OSS data structures to incorporate telemetry features at scale, and showcase why they do matter on scenarios in which we have Performance/Security/Ops issues.