Angelo Simone Scotto
Solution Architect
angelo.scotto@computer.org
Keep Calm and Distributed Tracing
Logging Tracing Monitoring
Technical
Pattern
Event Stream:
Append-Only Timestamped Records (a.k.a. Logs)
Trigger A specific event None (periodic)
Payload Business Events Technical Events System Status
Historical
Approach
LogLevel >= Info LogLevel = Trace Performance
Counters
Time
Above -> Before
Below -> After
Log File
• A log on a single machine/core
is simple to inspect.
Log File 1
• A log on a single machine/core
is simple to inspect.
• Async code (not just distributed
code) is way more complex.
• You need to aggregate data
from different places.
• You need to correlate traces
on one machine with traces on
others.
• This is why debugging async code
is hard.
Log File 3
Log File 2
?
• Seminal paper on Distributed Tracing.
• Defines the main terminology.
• Other key takeaway:
• Application Transparency:
programmers should not need
(necessarily) to be aware of the
tracing system.
• Sampling: Not all traces are
persisted, a probabilistic approach is
used (i.e. 1 sample every 1024 traces)
• Span: the primary building block of a distributed trace, representing an
individual unit of work done in a distributed system.
• Trace: a tree of Spans: a visualization of the life of a request as it moves
through a distributed system.
• 2010 – Google publishes Dapper Article (without source code).
• 2012 – Twitter publishes Zipkin (and source code).
• 2015 – Uber develops Jaeger (not published).
• 2016 – OpenTracing standard is published and joins CNCF.
• 2017 – Uber publishes Jaeger (and source code) and donate it to CNCF
?
? ?
• 2010 – Google publishes Dapper Article (without source code).
• 2012 – Twitter publishes Zipkin (and source code).
• 2015 – Uber develops Jaeger (not published).
• 2016 – OpenTracing standard is published and joins CNCF.
• 2017 – Uber publishes Jaeger (and source code) and donate it to CNCF.
• 2018 – Google publishes OpenCensus (and source code).
• 2018 – W3C initiate a Distributed Tracing Working Group.
• 2018 – OpenMetrics is published and adopted by CNCF.
• 2010 – Google publishes Dapper Article (without source code).
• 2012 – Twitter publishes Zipkin (and source code).
• 2015 – Uber develops Jaeger (not published).
• 2016 – OpenTracing standard is published and joins CNCF.
• 2017 – Uber publishes Jaeger (and source code) and donate it to CNCF.
• 2018 – Google publishes OpenCensus (and source code).
• 2018 – W3C initiate a Distributed Tracing Working Group.
• 2018 – OpenMetrics is published and adopted by CNCF.
• 2019 – OpenTelemetry is published and is adopted by CNCF.
• OpenTelemetry in .NET is just a wrapper on existing Activity and
ActivitySource classes from System.Diagnostics.
• Ensure high performance and stability.
• Is future-proof.
• OpenTelemetry still in pre-release (1.0.0-rc1.1).
• OpenTelemetry.Instrumentation nuget packages (AspNet, SqlClient,
Redis,…)
• OpenTelemetry.Exporter nuget packages (Jaeger, Zipkin, OTEL,…)
Code available https://bit.ly/netconfit-disttracing
Code available https://bit.ly/netconfit-disttracing
Code available https://bit.ly/netconfit-disttracing
• Successful Distributed Systems requires Distributed Tracing.
• Auto Instrumenting your applications requires few lines of code.
• Do yourself a favour and start instrumenting.
• OpenTracing or OpenTelemetry?
• OpenTelemetry is the future and now in RC (not yet in GA).
• OpenTracing is more mature but superseded by OpenTelemetry.
• We just scratched the surface:
• No in-depth discussion on Metrics and Logs.
• No in-depth discussion on async comms (i.e. queues instead of http).
• No in-depth analysis on OTEL Architecture (i.e. standalone connector).
• No in-depth analysis on trace enrichment/filtering/modification.
• No in-depth analysis on baggage/context serialization.
Slide e materiale su
https://aspit.co/netconfit-20
angelo.scotto@computer.org

Keep Calm and Distributed Tracing

  • 1.
    Angelo Simone Scotto SolutionArchitect angelo.scotto@computer.org Keep Calm and Distributed Tracing
  • 3.
    Logging Tracing Monitoring Technical Pattern EventStream: Append-Only Timestamped Records (a.k.a. Logs) Trigger A specific event None (periodic) Payload Business Events Technical Events System Status Historical Approach LogLevel >= Info LogLevel = Trace Performance Counters
  • 4.
    Time Above -> Before Below-> After Log File • A log on a single machine/core is simple to inspect.
  • 5.
    Log File 1 •A log on a single machine/core is simple to inspect. • Async code (not just distributed code) is way more complex. • You need to aggregate data from different places. • You need to correlate traces on one machine with traces on others. • This is why debugging async code is hard. Log File 3 Log File 2 ?
  • 7.
    • Seminal paperon Distributed Tracing. • Defines the main terminology.
  • 8.
    • Other keytakeaway: • Application Transparency: programmers should not need (necessarily) to be aware of the tracing system. • Sampling: Not all traces are persisted, a probabilistic approach is used (i.e. 1 sample every 1024 traces) • Span: the primary building block of a distributed trace, representing an individual unit of work done in a distributed system. • Trace: a tree of Spans: a visualization of the life of a request as it moves through a distributed system.
  • 10.
    • 2010 –Google publishes Dapper Article (without source code). • 2012 – Twitter publishes Zipkin (and source code). • 2015 – Uber develops Jaeger (not published). • 2016 – OpenTracing standard is published and joins CNCF. • 2017 – Uber publishes Jaeger (and source code) and donate it to CNCF
  • 12.
  • 16.
    • 2010 –Google publishes Dapper Article (without source code). • 2012 – Twitter publishes Zipkin (and source code). • 2015 – Uber develops Jaeger (not published). • 2016 – OpenTracing standard is published and joins CNCF. • 2017 – Uber publishes Jaeger (and source code) and donate it to CNCF. • 2018 – Google publishes OpenCensus (and source code). • 2018 – W3C initiate a Distributed Tracing Working Group. • 2018 – OpenMetrics is published and adopted by CNCF.
  • 17.
    • 2010 –Google publishes Dapper Article (without source code). • 2012 – Twitter publishes Zipkin (and source code). • 2015 – Uber develops Jaeger (not published). • 2016 – OpenTracing standard is published and joins CNCF. • 2017 – Uber publishes Jaeger (and source code) and donate it to CNCF. • 2018 – Google publishes OpenCensus (and source code). • 2018 – W3C initiate a Distributed Tracing Working Group. • 2018 – OpenMetrics is published and adopted by CNCF. • 2019 – OpenTelemetry is published and is adopted by CNCF.
  • 23.
    • OpenTelemetry in.NET is just a wrapper on existing Activity and ActivitySource classes from System.Diagnostics. • Ensure high performance and stability. • Is future-proof. • OpenTelemetry still in pre-release (1.0.0-rc1.1). • OpenTelemetry.Instrumentation nuget packages (AspNet, SqlClient, Redis,…) • OpenTelemetry.Exporter nuget packages (Jaeger, Zipkin, OTEL,…)
  • 26.
  • 27.
  • 28.
  • 30.
    • Successful DistributedSystems requires Distributed Tracing. • Auto Instrumenting your applications requires few lines of code. • Do yourself a favour and start instrumenting. • OpenTracing or OpenTelemetry? • OpenTelemetry is the future and now in RC (not yet in GA). • OpenTracing is more mature but superseded by OpenTelemetry. • We just scratched the surface: • No in-depth discussion on Metrics and Logs. • No in-depth discussion on async comms (i.e. queues instead of http). • No in-depth analysis on OTEL Architecture (i.e. standalone connector). • No in-depth analysis on trace enrichment/filtering/modification. • No in-depth analysis on baggage/context serialization.
  • 31.
    Slide e materialesu https://aspit.co/netconfit-20 angelo.scotto@computer.org