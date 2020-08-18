Successfully reported this slideshow.
Austin Parker, Principal Developer Advocate Why Distributed Tracing is Essential for Performance and Reliability
Who Am I? Austin Parker Principal Developer Advocate 2 @austinlparker austin@lightstep.com✉
What Changed? 3
4 More autonomy… but less visibility!
Observe kustomize ? ?? Control Team-by- team ok Must be org-wide! Distributed tracing!
What you can control What you are responsible for Stress (n): responsibility without control 6
Closing the gap between control and responsibility Responsibility for delivering performance and reliability Like many pro...
1. (n) the ability to navigate from effect to cause 2. (adj) related to supporting that ability (such as a tool or process...
Getting Actual Business Value From Distributed tracing 9 Developer velocity Software performance Managing costs Fundamenta...
Distributed Tracing 10
Traces are a form of telemetry based on spans with structure - Span = timed event describing work done by a single service...
Relationships matter 13 Traces encode causal relationships between callers and callees calls returns
Traces are the raw material, not the finished product Distributed traces – basically just structs Distributed tracing – th...
Developer Velocity 15
Increasing developer velocity - Make (common) tasks faster - Reduce interruptions - Improve communication - Prioritize hig...
Accelerate root cause analysis 17
More actionable alerts 18 “Are We All on the Same Page? Let’s Fix That”Luis Mineiro, SREcon EMEA 2019 Search for “same pa...
Understanding dependencies… without tracing 19 A B C E D C B B D A B E D 8% error rate avg. response size up 31% request r...
Understanding dependencies Without tracing... - Each connection in isolation - “A talks to B” - No way to narrow scope - N...
Use traces and service dependencies - Enhance training for new team members - Facilitate operational review meetings - Inf...
Software Performance 22
Improving software performance Performance means “performance as experienced by end users” Tracing can help by… - Better d...
Defining the critical path 24 waiting for blue… A (part of a) span is on the critical path if: - reducing its duration spe...
Rebalancing fan-out 25
Given a choice between speeding up A and B… 1. 50% improvement in B is better than an 50% improvement in A 2. No improveme...
Managing Costs 27
Types of costs Operational costs - Developer time (failed deployments, oncall, meeting overhead) Revenue and reputational ...
Calculating logging costs Initial Factors ‐ Aggregating and indexing logs per service: ‐ Storage ‐ Compute ‐ Network ‐ Pea...
$716 Cloud spend @ 50GB/logs (monthly) 30
$3,386 Total after setup, maintenance (monthly) 31
Reducing logging spend with tracing Annotate spans with logs! It’s as easy as: span.addEvent(“illegal base64 data at input...
Deploying Tracing 33
On your tracing migration Tracing is not an all-or-nothing endeavour - How to deliver incremental value for the org - How ...
Step 1 Start w/ customer-critical experiences Look at the edge and build an MVP - As close as you can (reasonably) get to...
Step 2 Playbook for service owners Establish conventions for tags, etc. - What matters to your business? - What would exp...
Step 3 Integrate with existing workflows Where do engineers work today? - IDEs, testing frameworks, CI/CD - Dashboards - ...
Building observable services Use open standards like OpenTelemetry for instrumenting service code. OpenTelemetry provides ...
In summary, distributed tracing provides... Faster RCA Better alerts Up-to-date dependency maps Improved compute fan-out T...
Q&A
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and Reliability

