SlideShare a Scribd company logo
Austin Parker, Principal Developer Advocate
Why Distributed
Tracing is Essential
for Performance and
Reliability
or, How to Get Actual Business
Value From Distributed Tracing!
Who Am I?
Austin Parker
Principal Developer Advocate
2
@austinlparker
austin@lightstep.com✉
What Changed?
3
4
More autonomy…
but less visibility!
Observe
kustomize
? ??
Control Team-by-
team ok
Must be
org-wide!
Distributed tracing!
What you can control
What you are
responsible for
Stress (n): responsibility without control
6
Closing the gap between control and responsibility
Responsibility for delivering performance and reliability
Like many problems, the solution requires:
- Having the right data
- Setting the right goals
- Giving teams ownership
Distributed tracing is essential to closing this gap
7
1. (n) the ability to navigate from effect to cause
2. (adj) related to supporting that ability (such as a tool or process)
"used an observability tool to understand what caused the change”
For example, being able to navigate from…
Spike in errors → misconfiguration
Increased latency → new customer behavior
User complaints → upstream service deployed
Observability əb-ˈzər-və-bi-lə-tē
8
Getting Actual Business Value From Distributed tracing
9
Developer velocity
Software performance
Managing costs
Fundamentals
Deploying tracing
Distributed Tracing
10
Traces are a form of telemetry based on spans with structure
- Span = timed event describing work done by a single service
Tracing is a diagnostic tool that reveals…
… how a set of services coordinate to handle individual user requests
… from mobile or browser to backends to databases (end-to-end)
… including metadata like events (logs) and annotations (tags)
Provides a request-centric view of application performance
Distributed tracing, defined
12
Relationships matter
13
Traces encode causal relationships between callers and callees
calls
returns
Traces are the raw material, not the finished product
Distributed traces – basically just structs
Distributed tracing – the art and science of deriving value from traces
14
Developer Velocity
15
Increasing developer velocity
- Make (common) tasks faster
- Reduce interruptions
- Improve communication
- Prioritize high impact work
16
Verify deployments
Root cause analysis
Better alerts
Understand dependencies
Define and track SLOs
Accelerate root cause analysis
17
More actionable alerts
18
“Are We All on the Same Page?
Let’s Fix That”Luis Mineiro, SREcon EMEA 2019
Search for “same page usenix”)
Understanding dependencies… without tracing
19
A
B
C
E
D
C
B
B
D
A
B
E
D 8% error
rate
avg. response
size up 31%
request rate up
4%
Understanding dependencies
Without tracing...
- Each connection in isolation
- “A talks to B”
- No way to narrow scope
- No way to meaningfully tie in
other metrics
20
With tracing...
- End-to-end context
- Request graph
- Can refine based on any
property of the request
- Metrics linked to current scope
Use traces and service dependencies
- Enhance training for new team members
- Facilitate operational review meetings
- Inform architectural design decisions
- Set SLOs for internal services
21
Use SLOs to…- Measure reliability- Set error budgets- Hold teams accountable
Software
Performance
22
Improving software performance
Performance means “performance as
experienced by end users”
Tracing can help by…
- Better distribution of computation
- Focusing optimization where it
matters
23
Defining the critical path
24
waiting for blue…
A (part of a) span is on the critical path if:
- reducing its duration speeds up overall request
therefore, blue is on
the critical path here
Rebalancing fan-out
25
Given a choice between speeding up A and B…
1. 50% improvement in B is better than an 50% improvement in A
2. No improvement in A will ever improve overall performance by >15%
Obvious… once you have the data :)
A B
A B A B
Amdahl’s Law
26
OR
Managing Costs
27
Types of costs
Operational costs
- Developer time (failed deployments, oncall, meeting overhead)
Revenue and reputational costs
- Missed SLOs, failed conversions, unhappy users
Infrastructure costs
- Compute, network, storage, API usage
Monitoring costs
28
Take aggregated logs as an example
Calculating logging costs
Initial Factors
‐ Aggregating and indexing logs per
service:
‐ Storage
‐ Compute
‐ Network
‐ Peak instance count
‐ Retention period
‐ Services involved in a request
29
Initial Values
Assuming 50GB of log data a day, 14
day retention, high availability (no cold
storage)
1 Primary (L Compute Optimized)
 $89
2 Data (XL Memory Optimized)
 $426
3 SSDs (General Purpose)
 $201
$716
Cloud spend @ 50GB/logs (monthly)
30
$3,386
Total after setup, maintenance (monthly)
31
Reducing logging spend with tracing
Annotate spans with logs! It’s as easy as:
span.addEvent(“illegal base64 data at input byte 7”)
Leverage traces to determine which logs to store
32
Monthly logging spend
$3,571 $712
Logging data is more valuable in context!
Deploying Tracing
33
On your tracing migration
Tracing is not an all-or-nothing endeavour
- How to deliver incremental value for the org
- How to use that value to inform next steps of the journey
Value to developers should be your (meta-)metric of success
journey
34
Step 1 Start w/ customer-critical experiences
Look at the edge and build an MVP
- As close as you can (reasonably) get to users
- Often an API gateway or proxy
Map incoming operations → dependencies
- Identify next steps
- Build a case for others to adopt tracing
35
Step 2 Playbook for service owners
Establish conventions for tags, etc.
- What matters to your business?
- What would explain failures?
Instrument frameworks, libraries, shared services
- Accelerate adoption by reusing code
- Enforce conventions programmatically
36
Step 3 Integrate with existing workflows
Where do engineers work today?
- IDEs, testing frameworks, CI/CD
- Dashboards
- Notification and alerting
- …
37
Building observable services
Use open standards like OpenTelemetry
for instrumenting service code.
OpenTelemetry provides a single set of
APIs, SDKs, and tools for generating
distributed traces and metrics from your
services.
38
In summary, distributed tracing provides...
Faster RCA
Better alerts
Up-to-date dependency maps
Improved compute fan-out
Targeted optimization
Integrated telemetry
39
Improved developer velocity
Faster software performance
Better cost management
Distributed tracing puts application behavior in context
to help answer the primary question of observability:
“What caused that change?”
Get Started Today
go.lightstep.com/request-a-demo
Q&A

More Related Content

What's hot

Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
Hemant Kumar
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
Kevin Ingelman
 
[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability
WSO2
 
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Reshmi Krishna
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
Maria Gomez
 
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.ioDistributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Max Klyga
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
DevOpsDays Tel Aviv
 
Software cracking and patching
Software cracking and patchingSoftware cracking and patching
Software cracking and patching
Mayank Gavri
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
Kevin Brockhoff
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
InfluxData
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Tonny Adhi Sabastian
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the ugly
Timetrix
 
Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016
AppNeta
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
Oracle Korea
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
Itiel Shwartz
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
Chandresh Pancholi
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
GetInData
 
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
Ambassador Labs
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_final
confluent
 

What's hot (20)

Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability[WSO2Con EU 2018] Tooling for Observability
[WSO2Con EU 2018] Tooling for Observability
 
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
 
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.ioDistributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.io
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
Software cracking and patching
Software cracking and patchingSoftware cracking and patching
Software cracking and patching
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the ugly
 
Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
 
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
2017 Microservices Practitioner Virtual Summit: Ancestry's Journey towards Mi...
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_final
 

Similar to Why Distributed Tracing is Essential for Performance and Reliability

Driving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed TracingDriving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed Tracing
DevOps.com
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: Metrics
Eric D. Schabell
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 Lecture
Wake Tech BAS
 
Run more experiments with fewer resources
Run more experiments with fewer resourcesRun more experiments with fewer resources
Run more experiments with fewer resources
VWO
 
VWO - Mark de Winter - Run more experiments with fewer resources.pdf
VWO - Mark de Winter - Run more experiments with fewer resources.pdfVWO - Mark de Winter - Run more experiments with fewer resources.pdf
VWO - Mark de Winter - Run more experiments with fewer resources.pdf
VWO
 
Kanban Overview
Kanban OverviewKanban Overview
Kanban Overview
Vít Kotačka
 
Basic-Project-Estimation-1999
Basic-Project-Estimation-1999Basic-Project-Estimation-1999
Basic-Project-Estimation-1999Michael Wigley
 
Human Sensor Conference Opening- Future of Real Estate Tech
Human Sensor Conference Opening- Future of Real Estate TechHuman Sensor Conference Opening- Future of Real Estate Tech
Human Sensor Conference Opening- Future of Real Estate Tech
Kevin Kononenko
 
TB8568_8568_Presentation
TB8568_8568_PresentationTB8568_8568_Presentation
TB8568_8568_PresentationRonnie Falgout
 
Real time analytics in Big Data
Real time analytics in Big DataReal time analytics in Big Data
Real time analytics in Big Data
BharathiRaja Chandrasekaran
 
Building The Agile Database
Building The Agile DatabaseBuilding The Agile Database
Building The Agile Databaseelliando dias
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
Ulf Mattsson
 
SaaS Vs On Premise BI
SaaS Vs On Premise BISaaS Vs On Premise BI
SaaS Vs On Premise BI
LCWynne
 
IP final project
IP final project IP final project
IP final project
SantySS
 
sigir16
sigir16sigir16
The Newgistics Digital Transformation Journey
The Newgistics Digital Transformation JourneyThe Newgistics Digital Transformation Journey
The Newgistics Digital Transformation Journey
Zenoss
 
integrating-cognitive-services-into-your-devops-strategy
integrating-cognitive-services-into-your-devops-strategyintegrating-cognitive-services-into-your-devops-strategy
integrating-cognitive-services-into-your-devops-strategyKarthik Jaganathan
 
Integrating cognitive services in to your devops strategy
Integrating cognitive services in to your devops strategyIntegrating cognitive services in to your devops strategy
Integrating cognitive services in to your devops strategy
Aspire Systems
 
Inventory System
Inventory System Inventory System
Inventory System
Nasir152222
 

Similar to Why Distributed Tracing is Essential for Performance and Reliability (20)

Flow-ABriefExplanation
Flow-ABriefExplanationFlow-ABriefExplanation
Flow-ABriefExplanation
 
Driving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed TracingDriving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed Tracing
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: Metrics
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 Lecture
 
Run more experiments with fewer resources
Run more experiments with fewer resourcesRun more experiments with fewer resources
Run more experiments with fewer resources
 
VWO - Mark de Winter - Run more experiments with fewer resources.pdf
VWO - Mark de Winter - Run more experiments with fewer resources.pdfVWO - Mark de Winter - Run more experiments with fewer resources.pdf
VWO - Mark de Winter - Run more experiments with fewer resources.pdf
 
Kanban Overview
Kanban OverviewKanban Overview
Kanban Overview
 
Basic-Project-Estimation-1999
Basic-Project-Estimation-1999Basic-Project-Estimation-1999
Basic-Project-Estimation-1999
 
Human Sensor Conference Opening- Future of Real Estate Tech
Human Sensor Conference Opening- Future of Real Estate TechHuman Sensor Conference Opening- Future of Real Estate Tech
Human Sensor Conference Opening- Future of Real Estate Tech
 
TB8568_8568_Presentation
TB8568_8568_PresentationTB8568_8568_Presentation
TB8568_8568_Presentation
 
Real time analytics in Big Data
Real time analytics in Big DataReal time analytics in Big Data
Real time analytics in Big Data
 
Building The Agile Database
Building The Agile DatabaseBuilding The Agile Database
Building The Agile Database
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
SaaS Vs On Premise BI
SaaS Vs On Premise BISaaS Vs On Premise BI
SaaS Vs On Premise BI
 
IP final project
IP final project IP final project
IP final project
 
sigir16
sigir16sigir16
sigir16
 
The Newgistics Digital Transformation Journey
The Newgistics Digital Transformation JourneyThe Newgistics Digital Transformation Journey
The Newgistics Digital Transformation Journey
 
integrating-cognitive-services-into-your-devops-strategy
integrating-cognitive-services-into-your-devops-strategyintegrating-cognitive-services-into-your-devops-strategy
integrating-cognitive-services-into-your-devops-strategy
 
Integrating cognitive services in to your devops strategy
Integrating cognitive services in to your devops strategyIntegrating cognitive services in to your devops strategy
Integrating cognitive services in to your devops strategy
 
Inventory System
Inventory System Inventory System
Inventory System
 

More from DevOps.com

Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source Software
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
DevOps.com
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
DevOps.com
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions
DevOps.com
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware Resolution
DevOps.com
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
DevOps.com
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident Response
DevOps.com
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
DevOps.com
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
DevOps.com
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
DevOps.com
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or Privately
DevOps.com
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid final
DevOps.com
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
DevOps.com
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
DevOps.com
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?
DevOps.com
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift Environments
DevOps.com
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
DevOps.com
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
DevOps.com
 

More from DevOps.com (20)

Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source Software
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware Resolution
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident Response
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or Privately
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid final
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift Environments
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
 

Recently uploaded

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

Why Distributed Tracing is Essential for Performance and Reliability

  • 1. Austin Parker, Principal Developer Advocate Why Distributed Tracing is Essential for Performance and Reliability or, How to Get Actual Business Value From Distributed Tracing!
  • 2. Who Am I? Austin Parker Principal Developer Advocate 2 @austinlparker austin@lightstep.com✉
  • 5. Observe kustomize ? ?? Control Team-by- team ok Must be org-wide! Distributed tracing!
  • 6. What you can control What you are responsible for Stress (n): responsibility without control 6
  • 7. Closing the gap between control and responsibility Responsibility for delivering performance and reliability Like many problems, the solution requires: - Having the right data - Setting the right goals - Giving teams ownership Distributed tracing is essential to closing this gap 7
  • 8. 1. (n) the ability to navigate from effect to cause 2. (adj) related to supporting that ability (such as a tool or process) "used an observability tool to understand what caused the change” For example, being able to navigate from… Spike in errors → misconfiguration Increased latency → new customer behavior User complaints → upstream service deployed Observability əb-ˈzər-və-bi-lə-tē 8
  • 9. Getting Actual Business Value From Distributed tracing 9 Developer velocity Software performance Managing costs Fundamentals Deploying tracing
  • 11.
  • 12. Traces are a form of telemetry based on spans with structure - Span = timed event describing work done by a single service Tracing is a diagnostic tool that reveals… … how a set of services coordinate to handle individual user requests … from mobile or browser to backends to databases (end-to-end) … including metadata like events (logs) and annotations (tags) Provides a request-centric view of application performance Distributed tracing, defined 12
  • 13. Relationships matter 13 Traces encode causal relationships between callers and callees calls returns
  • 14. Traces are the raw material, not the finished product Distributed traces – basically just structs Distributed tracing – the art and science of deriving value from traces 14
  • 16. Increasing developer velocity - Make (common) tasks faster - Reduce interruptions - Improve communication - Prioritize high impact work 16 Verify deployments Root cause analysis Better alerts Understand dependencies Define and track SLOs
  • 17. Accelerate root cause analysis 17
  • 18. More actionable alerts 18 “Are We All on the Same Page? Let’s Fix That”Luis Mineiro, SREcon EMEA 2019 Search for “same page usenix”)
  • 19. Understanding dependencies… without tracing 19 A B C E D C B B D A B E D 8% error rate avg. response size up 31% request rate up 4%
  • 20. Understanding dependencies Without tracing... - Each connection in isolation - “A talks to B” - No way to narrow scope - No way to meaningfully tie in other metrics 20 With tracing... - End-to-end context - Request graph - Can refine based on any property of the request - Metrics linked to current scope
  • 21. Use traces and service dependencies - Enhance training for new team members - Facilitate operational review meetings - Inform architectural design decisions - Set SLOs for internal services 21 Use SLOs to…- Measure reliability- Set error budgets- Hold teams accountable
  • 23. Improving software performance Performance means “performance as experienced by end users” Tracing can help by… - Better distribution of computation - Focusing optimization where it matters 23
  • 24. Defining the critical path 24 waiting for blue… A (part of a) span is on the critical path if: - reducing its duration speeds up overall request therefore, blue is on the critical path here
  • 26. Given a choice between speeding up A and B… 1. 50% improvement in B is better than an 50% improvement in A 2. No improvement in A will ever improve overall performance by >15% Obvious… once you have the data :) A B A B A B Amdahl’s Law 26 OR
  • 28. Types of costs Operational costs - Developer time (failed deployments, oncall, meeting overhead) Revenue and reputational costs - Missed SLOs, failed conversions, unhappy users Infrastructure costs - Compute, network, storage, API usage Monitoring costs 28 Take aggregated logs as an example
  • 29. Calculating logging costs Initial Factors ‐ Aggregating and indexing logs per service: ‐ Storage ‐ Compute ‐ Network ‐ Peak instance count ‐ Retention period ‐ Services involved in a request 29 Initial Values Assuming 50GB of log data a day, 14 day retention, high availability (no cold storage) 1 Primary (L Compute Optimized)  $89 2 Data (XL Memory Optimized)  $426 3 SSDs (General Purpose)  $201
  • 30. $716 Cloud spend @ 50GB/logs (monthly) 30
  • 31. $3,386 Total after setup, maintenance (monthly) 31
  • 32. Reducing logging spend with tracing Annotate spans with logs! It’s as easy as: span.addEvent(“illegal base64 data at input byte 7”) Leverage traces to determine which logs to store 32 Monthly logging spend $3,571 $712 Logging data is more valuable in context!
  • 34. On your tracing migration Tracing is not an all-or-nothing endeavour - How to deliver incremental value for the org - How to use that value to inform next steps of the journey Value to developers should be your (meta-)metric of success journey 34
  • 35. Step 1 Start w/ customer-critical experiences Look at the edge and build an MVP - As close as you can (reasonably) get to users - Often an API gateway or proxy Map incoming operations → dependencies - Identify next steps - Build a case for others to adopt tracing 35
  • 36. Step 2 Playbook for service owners Establish conventions for tags, etc. - What matters to your business? - What would explain failures? Instrument frameworks, libraries, shared services - Accelerate adoption by reusing code - Enforce conventions programmatically 36
  • 37. Step 3 Integrate with existing workflows Where do engineers work today? - IDEs, testing frameworks, CI/CD - Dashboards - Notification and alerting - … 37
  • 38. Building observable services Use open standards like OpenTelemetry for instrumenting service code. OpenTelemetry provides a single set of APIs, SDKs, and tools for generating distributed traces and metrics from your services. 38
  • 39. In summary, distributed tracing provides... Faster RCA Better alerts Up-to-date dependency maps Improved compute fan-out Targeted optimization Integrated telemetry 39 Improved developer velocity Faster software performance Better cost management Distributed tracing puts application behavior in context to help answer the primary question of observability: “What caused that change?”
  • 41. Q&A