Distributed tracing with erlang/elixir

May 2019
Distributed Tracing with
Erlang/Elixir projects
Ivan Glushkov  
@gliush

About myself
❖ Postmates, Infra Team
❖ MZ, Infra Team
❖ Echo, Backend Team
❖ MCST, Compiler Project
❖ DevZen podcast, Co-founder

Content
❖ Why Distributed Tracing (DT) is needed
❖ Ideal Design of the DT
❖ OpenTracing + Erlang/Elixir
❖ OpenCensus + Erlang/Elixir

Problem
- debug?
- introspect?
- proﬁle?

Design DT: Use Cases
❖ Log one request through all the services

❖ Gather all operations information (result, time)

❖ Build Dependency Graph

❖ Analytics (“Daper” paper)

❖ Tags, Logs, Artifacts for each operation

❖ Lines of Business analytics

❖ Lines of Business analytics
❖ QoS, Trafﬁc Control

Design DT: Idea
❖ User Request ID -> to pass to every subsystem:

Design DT: Idea
❖ HTTP: headers

Design DT: Idea
❖ HTTP: headers
❖ gRPC: additional ﬁeld / auto wrapping

Design DT: Idea
❖ HTTP: headers
❖ Event Bus: additional ﬁeld / auto wrapping

Design DT: Idea
❖ HTTP: headers
❖ Subsystem to have sub-request ID

Design DT: Idea
❖ HTTP: headers
❖ Subsystem to have sub-request ID
❖ Relation to the previous subsystem (parent/child, sequence, …)

❖ Sampling:
❖ pre/intra/post
❖ random/rate limited/by ﬂag
❖ Storage for DT
Design DT: Idea

❖ Lib - Storage?
❖ Lib - Collector - Storage?
❖ Agent - Storage?
❖ Agent - Collector - Storage?
❖ Synchronous?
Design DT: Architecture

❖ Lib - Storage?
❖ Lib - Collector - Storage?
❖ Agent - Storage?
❖ Agent - Collector - Storage?
❖ Synchronous?
https://github.com/EchoTeam/gtl

StorageServiceA
ReqID1
ReqID2
ReqID3
ServiceB
ServiceB
Collector

Design DT: Problems
❖ Too many traces -> OOM or CPU is 100%
❖ Too few traces -> miss problems
❖ Decide “on the ﬂy” is difﬁcult

OpenTracing
❖ Cloud Native Computing Foundation (cncf.io) incubating project
❖ Uber, Apple, Pinterest, Couchbase
❖ API speciﬁcation, libraries

OpenTracing: Concepts
❖ Trace
❖ Span: name, start time, end time
❖ Span: kv tags, kv logs, baggage items
❖ SpanContext
❖ Scopes + Threading + ActiveSpan
❖ Tracers: API + ready solutions
❖ Carriers: API to inject/extract SpanContext

OpenTracing: Flow
1. get SpanContext or start Trace => span.start(SpanContext)
2. span.store(tags/metrics/logs/baggage)
3.
4. span.ﬁnish()

OpenTracing: Flow
3. run another function with SpanContext
4. span.ﬁnish()

OpenTracing: Flow
3. send async message with SpanContext
4. span.ﬁnish()

OpenTracing: Flow
3. HTTP request with SpanContext in headers
4. span.ﬁnish()

OpenTracing: Sampling
❖ Sampling ratio
❖ Sampling priority (by tag, ﬂag, …)

OpenTracing: Tracers
❖ CNCF Jaeger (Uber)
❖ LightStep - SaaS solution
❖ Apache SkyWalking
❖ Datadog
❖ Wavefront

OpenTracing: problems
❖ No strict agreement about how to pass the SpanContext
❖ No good libraries for all the languages

OpenTracing: OTTER (Erlang)
❖ Last Update: Apr 2018

❖ Span - record, could be stored:
❖ Process Dict
❖ Multiname Process Dict
❖ Separate Process
OpenTracing: OTTER (Erlang)
Pid = otter_span_id_api:start("my request”),
…
otter_span_id_api:tag(SpanPid, "result", “ok"),
…
otter_span_id_api:ﬁnish(SpanPid),

[{ 
[ %% Condition
{greater, otter_span_duration, 5000000},
{value, otter_span_name, "radius request"}
], [ %% Action
{snapshot_count, [long_radius_request], []},
send_to_zipkin
]
}]
OpenTracing: OTTER (Erlang): Filters

❖ Implement Inject/Extract by yourself
❖ Repeat the semantics for every languages
OpenTracing: OTTER (Erlang): Inject/Extract

❖ Need to write A LOT of code
❖ Flexible conﬁguration
❖ No default agreements
OpenTracing: OTTER (Erlang): Summary

OpenTracing: Ex_Ray (Elixir)
❖ Last update: Oct 2017
❖ Store spans in ETS
❖ Magic with Elixir Macros

defmodule Nested do
use ExRay, pre: :before_fun, post: :after_fun
…
@trace kind: :critical
def fred(a, b), do: blee(a, b)
…
defp before_fun(ctx) do
Span.open(ctx.target, @req_id)
|> :otter.tag(:kind, ctx.meta[:kind])
|> :otter.log(">>> #{ctx.target} with #{ctx.args |> inspect}")
end
end
OpenTracing: Ex_Ray (Elixir)

❖ Less code needed
❖ Low quality code
❖ Memory leaks
❖ Exceptions are not re-raised in wrappers
❖ No default agreements
OpenTracing: Ex_Ray (Elixir): Summary

OpenCensus
❖ Started in Google
❖ Large community (Microsoft, Datadog, Prometheus, …)
❖ Automatic Context Propagation
❖ Reference implementation of the ofﬁcial W3C HTTP tracing header

OpenCensus: Concepts
❖ Trace, Span - similar to OpenTracing
❖ Link between spans: child/parent/unknown
❖ Sampling: Always/Never/Probabilistic (1 in 10000)/RateLimiting (10 per
sec)
❖ Automatic Context Propagation
❖ Stats/Metrics

❖ OpenCensus Service: Agent + Collector

❖ Agent

❖ Collector

OpenCensus Erlang
❖ Public GitHub repo for all Elixir/Erlang libs
❖ Libs for web-servers (Elli, Cowboy, Phoenix, …)
❖ Integrate with minimum effort

OpenCensus Erlang
❖ ETS table for Span data + GC for abandoned Spans
❖ Track SpanContext: process dict / variable
❖ Parse transform or manual context tracking
❖ Logger can receive SpanContext
❖ Metrics

ocp:with_child_span(<<“span1”>>),
ocp:with_child_span(<<“span2”>>, #{}, fun() … end)
OpenCensus Erlang
Process Dictionary Example

handler(Ctx, NextHandler) ->
SpanCtx = oc_trace:with_child_span(Ctx, <<"span-name">>),
try
oc_trace:put_attribute(<<"key">>, <<"value">>, SpanCtx),
{Code, Message} = NextHandler(SpanCtx),
oc_trace:set_status(Code, Message, SpanCtx)
after
oc_trace:finish_span(SpanCtx)
end.
OpenCensus Erlang
Manual Context Handling

❖ Gathers metrics (request processed, bytes sent, latency, …)
❖ Gets parent SpanContext, creates new child Span
❖ Integration (rebar.conﬁg change):
[{callback, elli_middleware},
{callback_args, [{mods, [{oc_elli_middleware, []}]
OpenCensus Erlang: Elli

OpenCensus Elixir
❖ Uses opencencus-erlang (e.g. prepare headers with SpanContext)
❖ Implements a macro: 
with_child_span “span1” do
…
end

❖ Uses “Phoenix Instrumenter”
❖ Creates Span for any Controller or View
❖ Integration (conﬁg.exs): 
 
instrumenters: [OpencensusPhoenix.Instrumenter]
OpenCensus Elixir: Phoenix

❖ Integrates into any pipeline with “Plug”
❖ Gets parent Span from headers
❖ Creates child Span with new attributes (call function to get them)
❖ Integration:
defmodule MyApp.TracePlug do 
# some custom configuration 
end 
 
plug MyApp.TracePlug
OpenCensus Elixir: Plug

OpenCensus BEAM: Summary
❖ A lot of libraries ready to be used
❖ Seamless integration with other languages
❖ You need to understand the concept

Summary
❖ A lot of advantages: Introspection, Analytics, LoB, QoS
❖ Think about sending metrics with OpenCensus
❖ Easy to integrate even with Erlang/Elixir

Breaking News
❖ Update: May 21st
❖ OpenTracing + OpenCensus  
=> OpenTelemetry
❖ Backward compatibility for both projects
❖ Nov 2019: readonly mode for  
OpenTracing, OpenCensus

Questions
❖ @gliush
❖ Ivan Glushkov
❖ http://devzen.ru

Distributed tracing with erlang/elixir

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Distributed tracing with erlang/elixir

Similar to Distributed tracing with erlang/elixir (20)

More from Ivan Glushkov

More from Ivan Glushkov (8)

Recently uploaded

Recently uploaded (20)

Distributed tracing with erlang/elixir