Distributed Tracing in Practice

Distributed tracing in practice
Ivo Mägi, CEO & product manager @ Plumbr
September 2019

What we are going to cover today?
Understanding the need for distributed traces and the general concepts
Examples of how a distributed traces help you to locate the root cause
Advanced examples of how distributed traces map root causes to real user impact
Different ways to add distributed tracing to your production services
Plumbr - sign up for your free trial a https://www.plumbr.io

How did we get to distributed services?
Software is eating the world
More and more major businesses and industries are being run
on software and delivered as online services.
----- Marc Andreessen, 2011
Plumbr - sign up for your free trial a https://www.plumbr.io 3

Software is eating the world faster
Large companies are forced to take plays from start-ups’
playbooks to stay competitive. Enterprises are under pressure
to innovate faster in order to stay in business.
----- McKinsey, 2019

Implications for the IT teams
Moving from monoliths to
microservices to enable
innovation in individual teams.
Adopting devops practices within
IT to support faster innovation

Distributed tracing – why bother?

Distributed
tracing - why
bother?
Support
tickets like
this.
From: John
To: support@example.com
Subject: Cannot complete checkout
I just tried to complete the order
#32828, but was unable to finish the
checkout. Your app stalled for 20
seconds and then gave me an error.
7Plumbr - sign up for your free trial a https://www.plumbr.io

…. turning into
this in two
weeks
From: John
Subject: Re:Re:Re:Re:Re:Cannot
complete the checkout
Managed finally capture the HAR file
from my browser using the
instructions you altered. However it
is too big to be sent as email
attachment. Please advise

The power of
distributed
tracing

What would
such a trace
look like?

Cornerstone
of any
distributed
trace: UUID
Universally Unique
Identifier (UUID)
• 128-bit random number
• Requires no central
coordinator
• For practical
purposes, unique
• You are 460,000,000
times more likely to
die from meteorite
impact than to clash
on UUIDs
11
68a9ab9d-f457-4dc8-98b0-645ef476fda6

Passing the UUID: HTTP-headers
15

Outcome: distributed trace
• Consisting of spans
• Registering the duration and
outcome of the trace
• Enriched with additional metadata at
span/trace level:
• User ID
• Cluster the span belongs to
• Node ID of the span
• …

Summary: three building blocks for distributed tracing

Put the
distributed
traces into
good use
Removing the need to manually
reproduce and gather evidence when
responding to support tickets
Fully understanding the impact of user-
facing issues
Prioritizing the improvements based on
the impact to end user
Proactively responding to issues via
alerting based on the tracing information

Hypothetical
support case
landing on
your desk
From: John
Subject: Cannot complete checkout
I just tried to complete the order
#32828, but was unable to finish the
checkout. Your app stalled for 20
seconds and then gave me an error.

…. two weeks
later
From: John
Subject: Re:Re:Re:Re:Re:Cannot
complete the checkout
Managed finally capture the HAR file
from my browser using the
instructions you altered. However it
is too big to be sent as email
attachment. Please advise

What happened during the two weeks?

Could it have
been different?
Yes. Lets walk through examples
understanding how distributed
tracing helps you by:
• Verifying the claim
• Prioritizing the response
• Understanding the true impact
• Proactively handling such
problems

Example #1: verifying the complaint

Example #1:
complaint
verified
Metadata added to the
trace allowed us to search
for the evidence
Spans linked to the trace
allowed us to verify the
failure had indeed occurred

Example #2: prioritizing the response

Example #2:
priorities
assigned
based on the
impact
Unique identification of an error
coupled with distributed tracing
allows you to objectively quantify
the priority for a particular error.
In the specific situation, (a high
priority) response is likely not
justified.

Example #3: zooming out to see what real users experience

Example #3: zooming out to what real users experience

Example #3:
true impact
only reveals
itself if traces
go all the way
to real user
Distributed tracing can and
should leave the server
rooms
End-to-end traces are the
way to expose both the
impact and root cause
correctly

Example #4: becoming proactive
+

Example #4: becoming proactive

Example #4: do
not rely upon
end users.
Harness the
true power of
distributed
traces
Trigger alerts based on
the impact
Send the alerts to
channels in use
Respond to incidents
using the root causes

Adopting distributed tracing:
different solutions available

Opensource
distributed
tracing solutions

Capturing a
trace with
Zipkin:
example
$tracing = create_tracing('php-frontend', '127.0.0.1');
$tracer = $tracing->getTracer();
$request = ComponentRequest::createFromGlobals();
/* Extract the context from HTTP headers */
$carrier = array_map(function ($header) {
return $header[0];
}, $request->headers->all());
$extractor = $tracing->getPropagation()-
>getExtractor(new Map());
$extractedContext = $extractor($carrier);
/* Create a span and set its attributes */
$span = $tracer->newChild($extractedContext);
$span->start(Timestampnow());
$span->setName('parse_request');
$span->setKind(ZipkinKindSERVER);

Capturing a trace with Zipkin: example

OS solutions:
flexible but
obtrusive
• You can tailor the metadata and model to match
your specific needs
• As a result, your application code is now
dependent on the framework
• In addition, there is the human factor – if you
forgot to add a particular endpoint, it will be
missing from traces
• Usability-wise, there are limited ways to query
and visualize the data.

Commercial distributed tracing solutions

Capturing a trace with Plumbr: example
$ java -javaagent:/path/to/plumbr.jar com.example.YourExecutable

Capturing a trace with Plumbr: example

Commercial
solutions: cost
attached but
do the heavy
lifting for you
• Installation is easy
• No dependencies at source code level
• Less nuances to deal with

Tying it
together
You now understand how distributed
tracing works
You got a sneak peek into how
different OS and commercial vendors
can help you to capture the
distributed traces
You are equipped with examples
how hard questions can be coupled
with simple answers thanks to the
distributed tracing helping you

And of course, when you go to your journey with distributed tracing …
51

… Plumbr will be the solution to consider
52

We integrate with your existing ecosystem
53

And all the information exposed is based on the distributed traces
54

Thank you!
Ivo Mägi, CEO & product manager
@ Plumbr

Distributed Tracing in Practice

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Distributed Tracing in Practice

Similar to Distributed Tracing in Practice (20)

More from DevOps.com

More from DevOps.com (20)

Recently uploaded

Recently uploaded (20)

Distributed Tracing in Practice

Editor's Notes