Distributed Tracing with OpenTracing, ZipKin and Kubernetes
1. container-solutions.com | @containersoluti | info@container-solutions.com
Distributed Tracing
with ZipKin &
Kubernetes
Maximilian Schöfmann
@schoefmann
Container Solutions AG
@containersoluti
2. container-solutions.com | @containersoluti | info@container-solutions.com
Microservices...
In short, the microservice architectural style is an approach to develop a single application
as a suite of small services, each running in its own process and communicating with
lightweight mechanisms, often an HTTP resource API. These services are built around
business capabilities and independently deployable by fully automated deployment
machinery. There is a bare minimum of centralized management of these services,
which may be written in different programming languages and use different data storage
technologies.
-- James Lewis and Martin Fowler
11. Distributed Tracing | container-solutions.com
Why distributed tracing?
“Per-process logging and metric monitoring have their place, but
neither can reconstruct the elaborate journeys that transactions
take as they propagate across a distributed system.
Distributed traces are these journeys.”
-- Chris Aniszczyk, Cloud Native Computing Foundation
12. Distributed Tracing | container-solutions.com
Fundamental requirements to make it work
● Ubiquitous deployment
● Continuous monitoring
See also: “Dapper, a Large-Scale Distributed Systems Tracing Infrastructure”
http://research.google.com/pubs/pub36356.html (2010)
13. Distributed Tracing | container-solutions.com
Requirements to make is useful
● Low overhead
● Application-level transparency
● Scalability
● (Timely) data availability
14. Distributed Tracing | container-solutions.com
A distributed trace...
“A tracing infrastructure for distributed
services needs to record information
about all the work done in a system, on
behalf of a given initiator”
15. Distributed Tracing | container-solutions.com
Data aggregation
Message record:
Record = Message identifier + timestamped event
Data aggregation classes:
● Black box
● Annotation-based
16. Distributed Tracing | container-solutions.com
● Trace as a tree of nested calls
● Trace trees and spans
Trace data model
17. www.container-solutions.com | info@container-solutions.com
Span
Logged event in a typical span
● Span name
● Span start time
● Span end time
● Trace id
● Span id
● Span parent id
● Any timing information recorded by the instrumentation library (RPC, HTTP)
● Additional custom labels (“foo”)
18. www.container-solutions.com | info@container-solutions.com
OpenTracing & ZipKin
Common libraries for several programming languages
➔ Libraries attach a trace context to the thread local storage
➔ RPC friendly (specially when using gRPC)
➔ The data is language-independent
opentracing.io zipkin.io
26. www.container-solutions.com | info@container-solutions.com
Sampling
➔ 2-stage sampling:
a. Client: Don’t send every trace instrumented
● limits client-side CPU and bandwidth overhead
● adjustable per service, hard to change in one go
b. Server: Don’t persist every trace received
● limits server-side IO and data volume overhead
● adjustable centrally with simple config change
➔ Adaptive sampling to trade off overhead against missing relevant traces
30. www.container-solutions.com | info@container-solutions.com
Some of the answered questions...
...with a distributed tracing system are:
● Which parts of my system are slow?
● Which call pattern can be optimized with parallelization?
● Which calls are redundant?
● Which routes are affected by this failing part?
● Under which circumstances is it failing?
● How often is it failing?
● Detect queries issued to read and write masters,
instead of read only replicas
31. www.container-solutions.com | info@container-solutions.com
A word of caution about distributed tracing
● Documentation is still rather poor
● Yet another moving part
● Can accumulate huge amounts of data
● Metrics need to be interpreted
● Commercial APM solutions might be an easier route for your use case...
32. www.container-solutions.com | info@container-solutions.com
A word of caution about distributed tracing
● Documentation is still rather poor
● Yet another moving part
● Can accumulate huge amounts of data
● Metrics need to be interpreted
● Commercial APM solutions might be an easier route for your use case...
35. www.container-solutions.com | info@container-solutions.com
Questions? Want to learn more?
● Come to our 2 day tinyurl.com/microservice-workshop
(November 8. + 9. or at your company on request)
● Follow us on Twitter: @containersoluti
● Read more on our blog: container-solutions.com/blog
● Or just get in touch: info@container-solutions.com