In this slide, we go through the Google Dapper, OpenTracing, Jaeger to OpenTelemetry. By reading and studying the history of Dapper, we could lean the experience and design theory of a large-scale distributed tracing system and then know how it affects other solutions, like OpenTracing and Jaeger.
We also discuss the difference between the OpenTracing and Jaeger and also demonstrate how Jaeger works and looks like.
After, we talked about the future of OpenTracing, the new organization called OpenTelemetry, what's its goal and how to do that.
4. Why Dapper
All most all open source solution are implemented
based on Dapper
ZipKin(Twitter)
Appdash(Sourcegraph)
Jaeger(Uber)
5. Dapper
Dapper, a Large-Scale Distributed Systems Tracing
Infrastructure
Published by Google (2010)
Used over two years (before 2010)
Link: https://research.google/pubs/pub36356/
6. Preface
Modern Internet services are implemented as distributed
system.
Complexe, large-scale
Maybe developed by different teams/languages.
Span many 1000+ machines across physical facilities.
7. Preface
Google built Dapper to provide Google's developers
with more information about the behavior of complex
distributed system.
Understanding system behavior requires observing
related activities across many different program and
machines.
8. Design Goals
Low overhead
Application-level transparency
Programmer should not to be aware of the tracing system.
Scalability
Handle the size of Google's services and clusters in the
next few years.
13. Overhead
Trace generation is the most critical segment.
In the Dapper libraries are creating/destroying spans and
annotations, logging to local disk.
Average of creation (2.2GHz x86 Server)
Span: 176 ns
Root Span: 204 ns
14. Overhead
CPU never user more 0.3%
of one core of production
machine.
Network:
Less than 0.01% of the
network traffic in Google's
production environment.
15. Web search cluster
Sampling is indeed necessary
Still an adequate amount of
trace data for high-volume
service when sampling rate as
low as 1/1024
Using lower sampling frequency
has the added benefit of
allowing data to persist longer.
16. Experience
Google AdWords
The teams used Dapper iteratively from the first system prototype through launch.
Performance
Optimize performance
Identify unnecessary serial request along the critical path
Correctness
Database (read-only/read-write)
Understanding
Testing
18. OpenTracing
MicroService provides a powerful architecture but not without its
own challenges.
Debugging and observing transactions across services.
Transactions are no in-memory calls or stack traces anymore.
Distributed tracing wants to solve this problem.
Provides a solution to describe and analyze the cross-process
transactions.
19. OpenTracing
OpenTracing is composed of an API specification,
frameworks and libraries that have implemented the
specification, and documentation for the project.
Allows SEs to add instrumentations to their application code
using APIs
Supported languages
Golang, Java, JS, Python, C++, C#..etc
22. Opentracing Units
Trace
The description of a transaction as it moves through a distributed system.
Represent calls made between the downstream services
Span
A named, timed operation representing a piece of the workflow
Includes
Operation name
Start and End timestamp
Key/Value attribute
Parent's Span identifier
Span Context
Carries data across process boundaries
SpanID, TraceID
Key: Value items.
23. Opentracing Units
Tags
Key:Value pairs that enable user-defined annotation of spans in
order to query and filter.
Logs
Key:Value pairs that are useful for capturing timed log messages.
Baggage Items
Carried by Span Context
26. Jaeger
Distributed tracing system released as open sourced by Uber.
Cloud Native Computing Foundation graduated project.
OpenTracing compatible data model and instrumentation libraries.
Golang/Node/Java/Python/C++
Storage backends
Cassandra, Elasticsearch, memory
30. Demo Project
Project: hotrod
Demo application that consists of several microservices
Run everything via docker-compose
https://github.com/jaegertracing/jaeger/tree/master/examples/hotrod
38. OpenTelemetry
(OpenCensus and OpenTracing) are OpenTelemetry
Telemetry (Tracing, Metrics, Logs)
OpenTelemetry provides the libraries, agent and other
components to you
Capture metrics, traces, metadata and logs and then sends
them to your backends like (Prometheus, Jaeger, Zipkin)