SlideShare a Scribd company logo
1 of 39
Download to read offline
Daniel “Spoons” Spoonhower, CTO and Co-founder
Driving Service
Ownership with
Distributed Tracing
Spoons (aka Daniel Spoonhower)
CTO and Co-founder, Lightstep
2
@save_spoons
spoons@lightstep.com✉
Who am I?
What Changed?
3
4
Service ownership, defined
App dev teams are responsible for delivery of software and service
Includes activities such as:
- Writing code 😁
- Fixing bugs 😞
- Incident response
- Cost management
- Capacity planning
- …
5
Increased independence
⟹ Higher developer velocity
Autonomous decisions
⟹ Better outcomes
Observe
kustomize
? ??
Control Team-by-
team ok
Must be
org-wide!
Distributed tracing!
Obstacles to successful service ownership
You can’t have independence without clear responsibilities and goals.
You can’t scale autonomy without consistent ways of measuring and
reporting on progress and outcomes.
7
8
Ownership means Accountability and Agency
Effective ownership requires Distributed Tracing
Importance of Documentation, Oncall, and SLOs
Distributed Tracing
9
Traces are a form of telemetry based on spans with structure
- Span = timed event describing work done by a single service
Tracing is a diagnostic tool that reveals…
… how a set of services coordinate to handle individual user requests
… from mobile or browser to backends to databases (end-to-end)
… including metadata like events (logs) and annotations (tags)
Provides a request-centric view of application performance
Distributed tracing, defined
11
Relationships matter
12
Traces encode causal relationships between callers and callees
calls
returns
Instrumentation
Instrumentation = code changes to emit spans
- Agents don’t work for microservices
- Agents incur too much overhead
- Agents lock you in to one vendor
Instrumentation captures app-specific metadata
- Service versions, customer segments, ...
Good news: many frameworks are already
instrumented!
13
Traces are the raw material, not the finished product
Distributed traces – basically just structs
Distributed tracing – the art and science of deriving value from traces
14
Driving Toward
Ownership
15
16
Docs
Oncall
SLOs
Start with expertise... then ownership
Make it easy to find related:
- Telemetry and dashboards
- Alert definitions
- Playbooks
Use a template!
Track last-modified dates
- Require periodic audits & updates
Centralized documentation
17
Make documentation machine-readable
Use it to generate:
- Dashboard config
- Escalation policy config
- Deployment pipeline config
- …
Make documentation necessary for
day-to-day work
Centralized documentation
18
Machine-readable
It’s hard to keep service dependencies
up to date manually…
So don’t!
- Use telemetry from the application
- Traces, in aggregate, reveal
service dependencies
Centralized documentation
19
Machine-readable
& Dynamic!
Record who is accountable
Automate many mundane tasks
Train new team members
Build confidence
Why is documentation important?
20
Oncall is (often) responsible for…
- Incident response
- Communicating status internally &
externally
- Production change management
- Deploying new code
- Pushing infrastructure changes
- Monitoring dashboards
- Low-urgency alert triage
- Customer requests
- And other interrupt driven work
- Shift handoffs
- Writing postmortems
Oncall
21
How to improve incident response:
- Reduce response times
- Deliver alerts to the right teams
Handling alerts
22
More context ⟶ understanding and mitigating faster
23
Send alerts directly
to the teams that
are responsible for
taking action!
Dynamic alert delivery
24
Check it out on
99percentdevops.com
How to improve incident response:
- Reduce response times
- Deliver alerts to the right teams
Handling alerts
25
- Delete unnecessary alerts
“How will we do better next time?”
- Ensure underlying issues are fixed
- Improve responses for novel issues
Make sure postmortems are blameless
- By establishing what happened
objectively using traces
Improving postmortems
26
Direct impact on customer experience (revenue, reputation, etc.)
Time spent handing pages, writing postmortems, handling interrupts is...
time not spent building new features, proactive optimization
Stress of oncall has major impact on job satisfaction
Why is improving oncall important?
27
Service Level Objectives
- Customer expectations for the
service your provide
- Both external and internal customers
- Stated in a way that can be
measured on short time scales
- Days, hours, or even minutes
SLOs
28
p99 latency < 5s over the last 5 minutes
threshold
service level indicator (SLI) measurement window
Common indicators
- Latency (p50, p99, etc.)
- Error rate
But specific to the endpoints,
operations, and flows of your application
29
“instantaneous” latency
latency over 5 minute window
Ask:
- What do your customers expect?
- What can you provide today?
- How do you expect that to change?
Determining SLOs
30
Derive internal SLOs using tracing
31
A
B
C
A p99 latency < 5s
B p99 latency < 5s
C: p99 latency < 2.5s
D
~p99.5 latency < 2.5s
They measure success in delivering service
Teams use them as a guide to prioritize work
Consistency and transparency across your org
- Hold teams accountable in a uniform way
Why are SLOs important?
32
3-piece puzzle review
Documentation
- Establishes ownership: who you will hold accountable
- Don’t try to manually manage dependency lists, etc.
Oncall
- More than just incident response
- Use tracing as part of investigation, automation, communication
SLOs
- How you measure success!
- Define objectives for internal services using tracing
33
Docs
Oncall
SLOs
Next steps…
34
Making changes
Rolling out new processes/tools in a DevOps org is hard
1. Process/tools must provide value to app dev teams
2. Ideally, they are necessary parts of their day-to-day work
To establish and maintain service ownership
- Use a combination of docs, oncall process, and SLOs
- Manufacture a need for those process/tools where necessary
- Give teams a budget for improving docs, alerts & reliability
35
Ownership = Accountability + Agency
Accountability
- Set deliverables and goals for service owners
- Judge their performance based on those deliverables and goals
Agency
- Offer the information, confidence, and budget to improve
36
Tracing provides key information to drive both!
Get Started Today
lightstep.com
Sign up for ourCommunity plan Start a free trial
Talk with an expert
Daniel “Spoons” Spoonhower, CTO and Co-founder
Q&A
@save_spoons
Daniel “Spoons” Spoonhower, CTO and Co-founder
Thank you
@save_spoons

More Related Content

What's hot

Enable DevSecOps using Jira Software
 Enable DevSecOps using Jira Software Enable DevSecOps using Jira Software
Enable DevSecOps using Jira SoftwareAtlassian
 
Service Virtualization: Delivering Complex Test Environments on Demand
Service Virtualization: Delivering Complex Test Environments on DemandService Virtualization: Delivering Complex Test Environments on Demand
Service Virtualization: Delivering Complex Test Environments on DemandErika Barron
 
DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)Qualitest
 
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...Deborah Schalm
 
Security at the Speed of Software Development
Security at the Speed of Software DevelopmentSecurity at the Speed of Software Development
Security at the Speed of Software DevelopmentDevOps.com
 
Monitoring at the Speed of DevOps
Monitoring at the Speed of DevOpsMonitoring at the Speed of DevOps
Monitoring at the Speed of DevOpsDevOps.com
 
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows Dev
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows DevDOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows Dev
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows DevGene Kim
 
DevSecOps - Building Rugged Software
DevSecOps - Building Rugged SoftwareDevSecOps - Building Rugged Software
DevSecOps - Building Rugged SoftwareSeniorStoryteller
 
The DevOps Journey in an Enterprise - DOES 2021
The DevOps Journey in an Enterprise - DOES 2021The DevOps Journey in an Enterprise - DOES 2021
The DevOps Journey in an Enterprise - DOES 2021Anders Lundsgård
 
Working on DevSecOps culture - a team centric view
Working on DevSecOps culture - a team centric viewWorking on DevSecOps culture - a team centric view
Working on DevSecOps culture - a team centric viewPatrick Debois
 
Succeeding with DevOps Transformation - Rafal Gancarz
Succeeding with DevOps Transformation - Rafal GancarzSucceeding with DevOps Transformation - Rafal Gancarz
Succeeding with DevOps Transformation - Rafal GancarzOpenCredo
 
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that MatterDOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that MatterGene Kim
 
CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...
CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...
CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...CloudIDSummit
 
Efficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins PipelineEfficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins PipelineJules Pierre-Louis
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsDynatrace
 
To Scale Test Automation for DevOps, Avoid These Anti-Patterns
To Scale Test Automation for DevOps, Avoid These Anti-PatternsTo Scale Test Automation for DevOps, Avoid These Anti-Patterns
To Scale Test Automation for DevOps, Avoid These Anti-PatternsDevOps.com
 
DevSecOps - Building continuous security into it and app infrastructures
DevSecOps - Building continuous security into it and app infrastructuresDevSecOps - Building continuous security into it and app infrastructures
DevSecOps - Building continuous security into it and app infrastructuresPriyanka Aash
 
Derek Roos (Mendix CEO) Keynote
Derek Roos (Mendix CEO) KeynoteDerek Roos (Mendix CEO) Keynote
Derek Roos (Mendix CEO) Keynotemendixrolf
 
Strengthen and Scale Security Using DevSecOps - OWASP Indonesia
Strengthen and Scale Security Using DevSecOps - OWASP IndonesiaStrengthen and Scale Security Using DevSecOps - OWASP Indonesia
Strengthen and Scale Security Using DevSecOps - OWASP IndonesiaMohammed A. Imran
 

What's hot (20)

Enable DevSecOps using Jira Software
 Enable DevSecOps using Jira Software Enable DevSecOps using Jira Software
Enable DevSecOps using Jira Software
 
Service Virtualization: Delivering Complex Test Environments on Demand
Service Virtualization: Delivering Complex Test Environments on DemandService Virtualization: Delivering Complex Test Environments on Demand
Service Virtualization: Delivering Complex Test Environments on Demand
 
DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)
 
Securing DevOps Lifecycle
Securing DevOps LifecycleSecuring DevOps Lifecycle
Securing DevOps Lifecycle
 
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
 
Security at the Speed of Software Development
Security at the Speed of Software DevelopmentSecurity at the Speed of Software Development
Security at the Speed of Software Development
 
Monitoring at the Speed of DevOps
Monitoring at the Speed of DevOpsMonitoring at the Speed of DevOps
Monitoring at the Speed of DevOps
 
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows Dev
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows DevDOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows Dev
DOES16 San Francisco - Scott Prugh & Erica Morrison - When Ops Swallows Dev
 
DevSecOps - Building Rugged Software
DevSecOps - Building Rugged SoftwareDevSecOps - Building Rugged Software
DevSecOps - Building Rugged Software
 
The DevOps Journey in an Enterprise - DOES 2021
The DevOps Journey in an Enterprise - DOES 2021The DevOps Journey in an Enterprise - DOES 2021
The DevOps Journey in an Enterprise - DOES 2021
 
Working on DevSecOps culture - a team centric view
Working on DevSecOps culture - a team centric viewWorking on DevSecOps culture - a team centric view
Working on DevSecOps culture - a team centric view
 
Succeeding with DevOps Transformation - Rafal Gancarz
Succeeding with DevOps Transformation - Rafal GancarzSucceeding with DevOps Transformation - Rafal Gancarz
Succeeding with DevOps Transformation - Rafal Gancarz
 
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that MatterDOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
 
CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...
CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...
CIS14: NSTIC - Identity and Access Management Collaborative Approaches to Nov...
 
Efficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins PipelineEfficient Performance Test Automation - Opitmizing the Jenkins Pipeline
Efficient Performance Test Automation - Opitmizing the Jenkins Pipeline
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for Ops
 
To Scale Test Automation for DevOps, Avoid These Anti-Patterns
To Scale Test Automation for DevOps, Avoid These Anti-PatternsTo Scale Test Automation for DevOps, Avoid These Anti-Patterns
To Scale Test Automation for DevOps, Avoid These Anti-Patterns
 
DevSecOps - Building continuous security into it and app infrastructures
DevSecOps - Building continuous security into it and app infrastructuresDevSecOps - Building continuous security into it and app infrastructures
DevSecOps - Building continuous security into it and app infrastructures
 
Derek Roos (Mendix CEO) Keynote
Derek Roos (Mendix CEO) KeynoteDerek Roos (Mendix CEO) Keynote
Derek Roos (Mendix CEO) Keynote
 
Strengthen and Scale Security Using DevSecOps - OWASP Indonesia
Strengthen and Scale Security Using DevSecOps - OWASP IndonesiaStrengthen and Scale Security Using DevSecOps - OWASP Indonesia
Strengthen and Scale Security Using DevSecOps - OWASP Indonesia
 

Similar to Driving Service Ownership with Distributed Tracing

Introducing "ReNDevoZ"--Work Flow Management software for Renewable projects
Introducing "ReNDevoZ"--Work Flow Management software for Renewable projectsIntroducing "ReNDevoZ"--Work Flow Management software for Renewable projects
Introducing "ReNDevoZ"--Work Flow Management software for Renewable projectsM V Radhakrishna
 
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityWhy Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityAggregage
 
APM Center of Excellence Drives Improved Business Results at Itau Unibanco
APM Center of Excellence Drives Improved Business Results at Itau UnibancoAPM Center of Excellence Drives Improved Business Results at Itau Unibanco
APM Center of Excellence Drives Improved Business Results at Itau UnibancoCA Technologies
 
Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...
Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...
Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...XebiaLabs
 
SOC 2 Compliance Made Easy with Process Street amp Drata
SOC 2 Compliance Made Easy with Process Street amp DrataSOC 2 Compliance Made Easy with Process Street amp Drata
SOC 2 Compliance Made Easy with Process Street amp DrataKashish Trivedi
 
Analyst Keynote: Continuous Delivery: Making DevOps Awesome
Analyst Keynote: Continuous Delivery: Making DevOps AwesomeAnalyst Keynote: Continuous Delivery: Making DevOps Awesome
Analyst Keynote: Continuous Delivery: Making DevOps AwesomeCA Technologies
 
Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"David Pedreno
 
Observability in highly distributed systems
Observability in highly distributed systemsObservability in highly distributed systems
Observability in highly distributed systemsDevOps Indonesia
 
10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITION
10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITION10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITION
10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITIONBhavanthSoni
 
White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...
White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...
White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...31West Global Services
 
Running Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT .docx
Running Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT      .docxRunning Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT      .docx
Running Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT .docxjeanettehully
 
Daffodil Software-Sharepoint Capability Document
Daffodil Software-Sharepoint Capability DocumentDaffodil Software-Sharepoint Capability Document
Daffodil Software-Sharepoint Capability DocumentAshok Surendran
 
Fyipe - One complete DevOps and IT Ops platform.
Fyipe - One complete DevOps and IT Ops platform. Fyipe - One complete DevOps and IT Ops platform.
Fyipe - One complete DevOps and IT Ops platform. Nawaz Dhandala
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs
 
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)Philippe Ensarguet
 
Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"David Pedreno
 
Software Operation Knowledge
Software Operation KnowledgeSoftware Operation Knowledge
Software Operation KnowledgeDevnology
 
10 Ways to Better Application-Centric Service Management
10 Ways to Better Application-Centric Service Management10 Ways to Better Application-Centric Service Management
10 Ways to Better Application-Centric Service ManagementLinh Nguyen
 

Similar to Driving Service Ownership with Distributed Tracing (20)

Introducing "ReNDevoZ"--Work Flow Management software for Renewable projects
Introducing "ReNDevoZ"--Work Flow Management software for Renewable projectsIntroducing "ReNDevoZ"--Work Flow Management software for Renewable projects
Introducing "ReNDevoZ"--Work Flow Management software for Renewable projects
 
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityWhy Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and Reliability
 
APM Center of Excellence Drives Improved Business Results at Itau Unibanco
APM Center of Excellence Drives Improved Business Results at Itau UnibancoAPM Center of Excellence Drives Improved Business Results at Itau Unibanco
APM Center of Excellence Drives Improved Business Results at Itau Unibanco
 
Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...
Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...
Building a Software Chain of Custody: A Guide for CTOs, CIOs, and Enterprise ...
 
SOC 2 Compliance Made Easy with Process Street amp Drata
SOC 2 Compliance Made Easy with Process Street amp DrataSOC 2 Compliance Made Easy with Process Street amp Drata
SOC 2 Compliance Made Easy with Process Street amp Drata
 
Analyst Keynote: Continuous Delivery: Making DevOps Awesome
Analyst Keynote: Continuous Delivery: Making DevOps AwesomeAnalyst Keynote: Continuous Delivery: Making DevOps Awesome
Analyst Keynote: Continuous Delivery: Making DevOps Awesome
 
DevOps
DevOpsDevOps
DevOps
 
Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"
 
Observability in highly distributed systems
Observability in highly distributed systemsObservability in highly distributed systems
Observability in highly distributed systems
 
10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITION
10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITION10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITION
10 ESSENTIALS FOR A SUCCESSFUL OFFSHORE TRANSITION
 
White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...
White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...
White Paper: "Keys to a Successful Call Center Transition" (31West Knowledge ...
 
Running Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT .docx
Running Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT      .docxRunning Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT      .docx
Running Head PROJECT PLAN-BUSINESS REQUIREMENT DOCUMENT .docx
 
Daffodil Software-Sharepoint Capability Document
Daffodil Software-Sharepoint Capability DocumentDaffodil Software-Sharepoint Capability Document
Daffodil Software-Sharepoint Capability Document
 
Fyipe - One complete DevOps and IT Ops platform.
Fyipe - One complete DevOps and IT Ops platform. Fyipe - One complete DevOps and IT Ops platform.
Fyipe - One complete DevOps and IT Ops platform.
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
 
Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"Asset Finance Systems: Project Initiation "101"
Asset Finance Systems: Project Initiation "101"
 
Software Operation Knowledge
Software Operation KnowledgeSoftware Operation Knowledge
Software Operation Knowledge
 
10 Ways to Better Application-Centric Service Management
10 Ways to Better Application-Centric Service Management10 Ways to Better Application-Centric Service Management
10 Ways to Better Application-Centric Service Management
 
E&E CV
E&E CVE&E CV
E&E CV
 

More from DevOps.com

Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareDevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...DevOps.com
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykDevOps.com
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudDevOps.com
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and PredictionsDevOps.com
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionDevOps.com
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)DevOps.com
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDevOps.com
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureDevOps.com
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportDevOps.com
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogDevOps.com
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDevOps.com
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid finalDevOps.com
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureDevOps.com
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021DevOps.com
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?DevOps.com
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsDevOps.com
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...DevOps.com
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...DevOps.com
 

More from DevOps.com (20)

Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source Software
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware Resolution
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident Response
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or Privately
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid final
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift Environments
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Driving Service Ownership with Distributed Tracing

  • 1. Daniel “Spoons” Spoonhower, CTO and Co-founder Driving Service Ownership with Distributed Tracing
  • 2. Spoons (aka Daniel Spoonhower) CTO and Co-founder, Lightstep 2 @save_spoons spoons@lightstep.com✉ Who am I?
  • 4. 4
  • 5. Service ownership, defined App dev teams are responsible for delivery of software and service Includes activities such as: - Writing code 😁 - Fixing bugs 😞 - Incident response - Cost management - Capacity planning - … 5 Increased independence ⟹ Higher developer velocity Autonomous decisions ⟹ Better outcomes
  • 6. Observe kustomize ? ?? Control Team-by- team ok Must be org-wide! Distributed tracing!
  • 7. Obstacles to successful service ownership You can’t have independence without clear responsibilities and goals. You can’t scale autonomy without consistent ways of measuring and reporting on progress and outcomes. 7
  • 8. 8 Ownership means Accountability and Agency Effective ownership requires Distributed Tracing Importance of Documentation, Oncall, and SLOs
  • 10.
  • 11. Traces are a form of telemetry based on spans with structure - Span = timed event describing work done by a single service Tracing is a diagnostic tool that reveals… … how a set of services coordinate to handle individual user requests … from mobile or browser to backends to databases (end-to-end) … including metadata like events (logs) and annotations (tags) Provides a request-centric view of application performance Distributed tracing, defined 11
  • 12. Relationships matter 12 Traces encode causal relationships between callers and callees calls returns
  • 13. Instrumentation Instrumentation = code changes to emit spans - Agents don’t work for microservices - Agents incur too much overhead - Agents lock you in to one vendor Instrumentation captures app-specific metadata - Service versions, customer segments, ... Good news: many frameworks are already instrumented! 13
  • 14. Traces are the raw material, not the finished product Distributed traces – basically just structs Distributed tracing – the art and science of deriving value from traces 14
  • 17. Start with expertise... then ownership Make it easy to find related: - Telemetry and dashboards - Alert definitions - Playbooks Use a template! Track last-modified dates - Require periodic audits & updates Centralized documentation 17
  • 18. Make documentation machine-readable Use it to generate: - Dashboard config - Escalation policy config - Deployment pipeline config - … Make documentation necessary for day-to-day work Centralized documentation 18 Machine-readable
  • 19. It’s hard to keep service dependencies up to date manually… So don’t! - Use telemetry from the application - Traces, in aggregate, reveal service dependencies Centralized documentation 19 Machine-readable & Dynamic!
  • 20. Record who is accountable Automate many mundane tasks Train new team members Build confidence Why is documentation important? 20
  • 21. Oncall is (often) responsible for… - Incident response - Communicating status internally & externally - Production change management - Deploying new code - Pushing infrastructure changes - Monitoring dashboards - Low-urgency alert triage - Customer requests - And other interrupt driven work - Shift handoffs - Writing postmortems Oncall 21
  • 22. How to improve incident response: - Reduce response times - Deliver alerts to the right teams Handling alerts 22
  • 23. More context ⟶ understanding and mitigating faster 23
  • 24. Send alerts directly to the teams that are responsible for taking action! Dynamic alert delivery 24 Check it out on 99percentdevops.com
  • 25. How to improve incident response: - Reduce response times - Deliver alerts to the right teams Handling alerts 25 - Delete unnecessary alerts
  • 26. “How will we do better next time?” - Ensure underlying issues are fixed - Improve responses for novel issues Make sure postmortems are blameless - By establishing what happened objectively using traces Improving postmortems 26
  • 27. Direct impact on customer experience (revenue, reputation, etc.) Time spent handing pages, writing postmortems, handling interrupts is... time not spent building new features, proactive optimization Stress of oncall has major impact on job satisfaction Why is improving oncall important? 27
  • 28. Service Level Objectives - Customer expectations for the service your provide - Both external and internal customers - Stated in a way that can be measured on short time scales - Days, hours, or even minutes SLOs 28 p99 latency < 5s over the last 5 minutes threshold service level indicator (SLI) measurement window Common indicators - Latency (p50, p99, etc.) - Error rate But specific to the endpoints, operations, and flows of your application
  • 30. Ask: - What do your customers expect? - What can you provide today? - How do you expect that to change? Determining SLOs 30
  • 31. Derive internal SLOs using tracing 31 A B C A p99 latency < 5s B p99 latency < 5s C: p99 latency < 2.5s D ~p99.5 latency < 2.5s
  • 32. They measure success in delivering service Teams use them as a guide to prioritize work Consistency and transparency across your org - Hold teams accountable in a uniform way Why are SLOs important? 32
  • 33. 3-piece puzzle review Documentation - Establishes ownership: who you will hold accountable - Don’t try to manually manage dependency lists, etc. Oncall - More than just incident response - Use tracing as part of investigation, automation, communication SLOs - How you measure success! - Define objectives for internal services using tracing 33 Docs Oncall SLOs
  • 35. Making changes Rolling out new processes/tools in a DevOps org is hard 1. Process/tools must provide value to app dev teams 2. Ideally, they are necessary parts of their day-to-day work To establish and maintain service ownership - Use a combination of docs, oncall process, and SLOs - Manufacture a need for those process/tools where necessary - Give teams a budget for improving docs, alerts & reliability 35
  • 36. Ownership = Accountability + Agency Accountability - Set deliverables and goals for service owners - Judge their performance based on those deliverables and goals Agency - Offer the information, confidence, and budget to improve 36 Tracing provides key information to drive both!
  • 37. Get Started Today lightstep.com Sign up for ourCommunity plan Start a free trial Talk with an expert
  • 38. Daniel “Spoons” Spoonhower, CTO and Co-founder Q&A @save_spoons
  • 39. Daniel “Spoons” Spoonhower, CTO and Co-founder Thank you @save_spoons