SlideShare a Scribd company logo
1 of 65
Changing the
Game:
How Game
Theory can break
down silos
Kevin Crawley – Developer Relations // Instana
Principle SRE Architect & Co-Owner // Single
Twitter: @notsureifkevin
▫ Docker Captain
▫ Gitlab Hero
▫ DevOpsDays Nashville Organizer
▫ 20 years in software development
▫ 5+ years DevOps/SRE experience
About Me
Discussion Points
▪ How does Game Theory tear down Silos
▪ Characteristics of High Performance
Organizations
▪ DevOps and Site Reliability Engineers
▪ What SREs need to be effective
Let’s talk about
Game Theory
(Disclaimer: I’m bad at math)
source: Nirmal Mehta (Docker Captain)
What is Bad Equilibrium?
It’s a strategy that all players in the game can adoptand converge on, butit
won’tproduce a desirable outcome for anyone.
https://pdfs.semanticscholar.org/30d1/a03db196384a17fed3247407fb5859f7c76b.pdf
Transformation: Focusing on Automation
https://devops-research.com/
Where do silos come from?
Silos can be defined as the contention which exist
between functional units within an organization.
This contention usually manifests between teams
where change management policy requirements
and risks are high.
Nash Equilibrium
(Prisoners Dilemma)
A concept of game theory where the optimal
outcome of a game is one where no player has an
incentive to deviate from her chosen strategy after
considering an opponents choice.
https://en.wikipedia.org/wiki/Nash_equilibrium
Split / Steal – Example 1
▪ Video has been removed to save bandwidth, you may view it
here on YouTube
https://www.youtube.com/watch?v=p3Uos2fzIJ0
Nash Equilibrium outcomes
Pareto Efficiency
Is a state of allocation of resources in which it is
impossible to make any one individual better off
without making at least one individual worse off.
… aka ZERO SUM
https://en.wikipedia.org/wiki/Pareto_efficiency
Pareto Inefficiency
A situationis inefficient if someone canbe made better off even after
compensating those made worse off.
Pareto Inefficient Nash Equilibrium
… is a Bad Equilibrium
Split / Steal Example 2
▪ Video has been removed to save bandwidth, you may view it
here on YouTube
https://www.youtube.com/watch?v=S0qjK3TWZE8
Don’t like the game?
Change the Game
New Nash Equilibrium
Pareto Inefficient Nash Equilibrium
Gives you permission and proof to change the
game
Change the Game
Percentage of Work Done Manually
ELITE
PERFORMERS
HIGH
PERFORMERS
LOW
PERFORMERS
Configuration
Management
5% 10% 30%
Testing 10% 20% 30%
Deployments 5% 10% 30%
Change
approval
process
10% 30% 40%
https://devops-research.com/
High Performance vs Low Performance
Organizations
High Performers
▪ Deployments:
> 1 hour and < 1 day
▪ Lead Time for
Changes:
> 1 day and < 1 week
▪ MTTR:
< 1 day
▪ Change Failure Rate:
0-15%
Low Performers
▪ Deployments:
Once per week/month
▪ Lead Time for Changes:
> 1 month and <6
months
▪ MTTR:
> 1 week and < 1 month
▪ Change Failure Rate:https://devops-research.com/
What happens when we tear down the silos
and become a DevOps organization?
▪ We ship more software more often,
complexity increases and reliability starts to
decline
▪ We naturally shift our focus to solve the
scalability and reliability issues (alternatively
we give up and readopt the monolith)
▪ Rise of the Site Reliability Engineers
Transformation: Focusing on Information
https://devops-research.com/
What are some tools / processes that
organizations can put in place to change
our equilibrium and communicate?
▪ Communication & Collaboration Tools
▫ Slack, Git, Pagerduty, OpsGenie
▪ Observability (SRE) Tooling
▫ Custom Dashboards / Metrics /
Alerting
▫ Log Analytics
▫ Distributed Tracing
What do SREs care about?
▪ Reliability (this one is obvious)
▪ Performance (is the customer happy?)
▪ Costs (is the business happy?)
SREs are in the business of measurement and
define objectives through SLOs by measuring SLIs.
What do SREs typically measure
▪ Error Rates
▪ Latency
▪ Throughput
▪ Saturation
“The Four Golden Signals” - https://landing.google.com/sre/sre-
book/chapters/monitoring-distributed-systems/
What is Observability?
Kalman, 1961 paper
On the general theory of control systems
▪ A system is observable if the behavior of the entire system
can be determined by only looking at its inputs and outputs.
▪ Lesson: control theory is a well-documented approach which
people can learn from vs trying to reinvent
Can we get some pillars?
The 4 pillars of Observability was originally described in a blog
article from Twitter:
▪ Monitoring
▪ Log Aggregation / Analytics
▪ Distributed systems tracing infrastructure
▪ Alerting / Visualization
https://blog.twitter.com/engineering/en_us/a/2016/observability-at-twitter-technical-
overview-part-i.html
More than just
pillars…
“While plainly having access to logs, metrics, and traces
doesn’t necessarily make systems more observable, these
are powerful tools that, if understood well, can unlock the
ability to build better systems.”
- Cindy Sridharen
https://www.oreilly.com/library/view/distributed-systems-observability/9781492033431/
Observability gives us the means to
understand all of the behavior in our
systems
▪ Not just tooling, it’s how
we model and analyze
data
▪ Similar to how DevOps is
a mindset / culture
▪ No longer treating
services like Schrödinger's
cat
▪ (A lot) more context
around events and
transactions
https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
Why does my
organization need any
of this?
This sounds like a lot of work …
How many of you are running
staging environments?
How many of you actually trust
your staging environments?
In order to observe a system, we must emit
signals and analyze the aggregates.
Those aggregates can answer the
following questions (and more):
▪ Number of Reqs / Retries / Backoffs
(throughput)
▪ Request parameters / Query Statements
(details)
▪ Latency / Outliers (performance)
▪ Top-Level Exceptions / Log Messages (error
analysis)
How can we collect this data?
Distributed Tracing
▪ Also known as Distributed Structured
Logging
▪ Larger Payloads
▪ Rich Contextual Data
https://w3c.github.io/trace-context
Sampling vs. No Sample
▪ Sampling traces may result in important
outliers (P95/P99) to be missed
▪ Extremely high volume systems must
sample due to massive overhead
▪ Start without sampling, adopt as needed,
incorporate solutions which sample
adaptively
How has Observability helped enable a
DevOps culture?
Let’s take a look at a production
microservice application which has been
instrumented by a distributed tracing
solution
▪ Operated by 3 engineers (1 FE/1 BE/1 SRE)
▪ Over 20k transaction / hour, 20+ integrations, 150k LOC, with less
than 15% test coverage
▪ Launched in 2018 with 15 microserviceson DockerSwarm – has since
expanded to over35 microserviceswith zero additional engineering
personnel
▪ One-touch deployment and provisioningfor newand existing services
Visualizing
Large and
Complex
Systems
Analyzing
Distributed Trace
Aggregates
What happens if we aggregate timing, error rate, and # of
reqs for each endpoint on a service
What problems
have Distributed
Tracing helped
solve?
Database Optimizations, Caching, and Concurrency
@notsureifkevin
Exponential
Backoff
Slow Death
of a Service
Rise in Latency + Processing Time
▪ DBO (Hibernate Query) causing O(n log n) rise in latency and
processing time
▪ Application Dashboard indicated an issue with overall latency
increasing
▪ Fix deployed and improvement was observed immediately
Issue Resolved
Caching Solved one problem
… but caused another
▪ We implemented Redis for caching, and processing time went
down
▪ However, we didn’t account for token policies changing and
they suddenly began to expire after 30 seconds
▪ Alerting around error rates for this endpoint raised our
awareness around this issue
Context is critical
Metrics are not standalone, they have relationships
Custom
Dashboards
We utilize a mix of Instana, Logz.io and Grafana to manage
our systems
Focusing on Observability
▪ Enables your organization to understand the behavior
of your system
▪ Empowers your engineers to find and fix problems
▪ Enables you to build more reliable systems and ship
software faster
▪ Promotes empathy through understanding,
transparency, and communication.
Want to learn more about monitoring
production microservice apps?
▪ Follow me on twitter for upcoming workshops
@notsureifkevin & @InstanaHQ
▪ Get a free trial of Instana @ https://instana.com

More Related Content

What's hot

How Do We Better Sell DevOps? - PuppetConf 2013
How Do We Better Sell DevOps? - PuppetConf 2013How Do We Better Sell DevOps? - PuppetConf 2013
How Do We Better Sell DevOps? - PuppetConf 2013Puppet
 
Scaling DevOps - delivering on the promise of business velocity and quality
Scaling DevOps - delivering on the promise of business velocity and qualityScaling DevOps - delivering on the promise of business velocity and quality
Scaling DevOps - delivering on the promise of business velocity and qualityXebiaLabs
 
How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...Skelton Thatcher Consulting Ltd
 
AtlasCamp 2015 Keynote
AtlasCamp 2015 KeynoteAtlasCamp 2015 Keynote
AtlasCamp 2015 KeynoteAtlassian
 
LeSS-Intro - Scrum Meetup Berlin
LeSS-Intro - Scrum Meetup BerlinLeSS-Intro - Scrum Meetup Berlin
LeSS-Intro - Scrum Meetup BerlinAnton Skornyakov
 
Q Con 2008 - Unleashing the Fossa
Q Con 2008 - Unleashing the FossaQ Con 2008 - Unleashing the Fossa
Q Con 2008 - Unleashing the FossaSteve Greene
 
What the Fuck is DevOps?
What the Fuck is DevOps?What the Fuck is DevOps?
What the Fuck is DevOps?James Turnbull
 
5 Steps for a High-Performing DevOps Culture
5 Steps for a High-Performing DevOps Culture5 Steps for a High-Performing DevOps Culture
5 Steps for a High-Performing DevOps CultureJumpCloud
 
DevOps State of the Union 2015
DevOps State of the Union 2015DevOps State of the Union 2015
DevOps State of the Union 2015Ernest Mueller
 
DevOpsGuys - Getting Started with DevOps - Github/Azure Webinar
DevOpsGuys - Getting Started with DevOps - Github/Azure WebinarDevOpsGuys - Getting Started with DevOps - Github/Azure Webinar
DevOpsGuys - Getting Started with DevOps - Github/Azure WebinarDevOpsGroup
 
Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014Matthew Skelton
 
DevOpsGuys - DevOps Automation - The Good, The Bad and The Ugly
DevOpsGuys - DevOps Automation - The Good, The Bad and The UglyDevOpsGuys - DevOps Automation - The Good, The Bad and The Ugly
DevOpsGuys - DevOps Automation - The Good, The Bad and The UglyDevOpsGroup
 
Ship Faster Without Breaking Everything - XebiaLabs + SaltStack Webinar
Ship Faster Without Breaking Everything - XebiaLabs + SaltStack WebinarShip Faster Without Breaking Everything - XebiaLabs + SaltStack Webinar
Ship Faster Without Breaking Everything - XebiaLabs + SaltStack WebinarXebiaLabs
 
Why #DevOps Transformation has to start with you
Why #DevOps Transformation has to start with youWhy #DevOps Transformation has to start with you
Why #DevOps Transformation has to start with youDevOpsGroup
 
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...Skelton Thatcher Consulting Ltd
 
The Atlassian Bug Bounty Program
The Atlassian Bug Bounty ProgramThe Atlassian Bug Bounty Program
The Atlassian Bug Bounty ProgramAtlassian
 
An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016Chris Jackson
 
O365Engage17 - Ins and outs of monitoring office 365
O365Engage17 - Ins and outs of monitoring office 365O365Engage17 - Ins and outs of monitoring office 365
O365Engage17 - Ins and outs of monitoring office 365NCCOMMS
 
Devops Kaizen - DevopsDays Dallas 2017
Devops Kaizen - DevopsDays Dallas 2017 Devops Kaizen - DevopsDays Dallas 2017
Devops Kaizen - DevopsDays Dallas 2017 John Willis
 

What's hot (20)

How Do We Better Sell DevOps? - PuppetConf 2013
How Do We Better Sell DevOps? - PuppetConf 2013How Do We Better Sell DevOps? - PuppetConf 2013
How Do We Better Sell DevOps? - PuppetConf 2013
 
Scaling DevOps - delivering on the promise of business velocity and quality
Scaling DevOps - delivering on the promise of business velocity and qualityScaling DevOps - delivering on the promise of business velocity and quality
Scaling DevOps - delivering on the promise of business velocity and quality
 
How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...How to address operational aspects effectively with Agile practices - Matthew...
How to address operational aspects effectively with Agile practices - Matthew...
 
AtlasCamp 2015 Keynote
AtlasCamp 2015 KeynoteAtlasCamp 2015 Keynote
AtlasCamp 2015 Keynote
 
LeSS-Intro - Scrum Meetup Berlin
LeSS-Intro - Scrum Meetup BerlinLeSS-Intro - Scrum Meetup Berlin
LeSS-Intro - Scrum Meetup Berlin
 
Q Con 2008 - Unleashing the Fossa
Q Con 2008 - Unleashing the FossaQ Con 2008 - Unleashing the Fossa
Q Con 2008 - Unleashing the Fossa
 
What the Fuck is DevOps?
What the Fuck is DevOps?What the Fuck is DevOps?
What the Fuck is DevOps?
 
5 Steps for a High-Performing DevOps Culture
5 Steps for a High-Performing DevOps Culture5 Steps for a High-Performing DevOps Culture
5 Steps for a High-Performing DevOps Culture
 
Devops skills you got what it takes ?
Devops skills   you got what it takes ?Devops skills   you got what it takes ?
Devops skills you got what it takes ?
 
DevOps State of the Union 2015
DevOps State of the Union 2015DevOps State of the Union 2015
DevOps State of the Union 2015
 
DevOpsGuys - Getting Started with DevOps - Github/Azure Webinar
DevOpsGuys - Getting Started with DevOps - Github/Azure WebinarDevOpsGuys - Getting Started with DevOps - Github/Azure Webinar
DevOpsGuys - Getting Started with DevOps - Github/Azure Webinar
 
Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014
 
DevOpsGuys - DevOps Automation - The Good, The Bad and The Ugly
DevOpsGuys - DevOps Automation - The Good, The Bad and The UglyDevOpsGuys - DevOps Automation - The Good, The Bad and The Ugly
DevOpsGuys - DevOps Automation - The Good, The Bad and The Ugly
 
Ship Faster Without Breaking Everything - XebiaLabs + SaltStack Webinar
Ship Faster Without Breaking Everything - XebiaLabs + SaltStack WebinarShip Faster Without Breaking Everything - XebiaLabs + SaltStack Webinar
Ship Faster Without Breaking Everything - XebiaLabs + SaltStack Webinar
 
Why #DevOps Transformation has to start with you
Why #DevOps Transformation has to start with youWhy #DevOps Transformation has to start with you
Why #DevOps Transformation has to start with you
 
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...
 
The Atlassian Bug Bounty Program
The Atlassian Bug Bounty ProgramThe Atlassian Bug Bounty Program
The Atlassian Bug Bounty Program
 
An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016
 
O365Engage17 - Ins and outs of monitoring office 365
O365Engage17 - Ins and outs of monitoring office 365O365Engage17 - Ins and outs of monitoring office 365
O365Engage17 - Ins and outs of monitoring office 365
 
Devops Kaizen - DevopsDays Dallas 2017
Devops Kaizen - DevopsDays Dallas 2017 Devops Kaizen - DevopsDays Dallas 2017
Devops Kaizen - DevopsDays Dallas 2017
 

Similar to How Game Theory and Observability Can Tear Down Silos and Improve DevOps

Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs
 
Turning Human Capital into High Performance Organizational Capital
Turning Human Capital into High Performance Organizational CapitalTurning Human Capital into High Performance Organizational Capital
Turning Human Capital into High Performance Organizational CapitalJohn Willis
 
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...DevOpsDays Houston
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops finalGene Kim
 
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)Gonzague PATINIER
 
2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1a2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1aGene Kim
 
DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck VictorOps
 
DBA Role Shift in a DevOps World
DBA Role Shift in a DevOps WorldDBA Role Shift in a DevOps World
DBA Role Shift in a DevOps WorldDatavail
 
Agile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego GameAgile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego GameDana Pylayeva
 
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 20185 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018Matthew Skelton
 
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...Matthew Skelton
 
Service Management in a DevOps World - by Helen Beal
Service Management in a DevOps World - by Helen BealService Management in a DevOps World - by Helen Beal
Service Management in a DevOps World - by Helen BealPlutora
 
DevOps Transformation - Another View
DevOps Transformation - Another ViewDevOps Transformation - Another View
DevOps Transformation - Another ViewAgron Fazliu
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
Delivering Better Software Faster (Without Breaking Everything)
Delivering Better Software Faster (Without Breaking Everything)Delivering Better Software Faster (Without Breaking Everything)
Delivering Better Software Faster (Without Breaking Everything)XebiaLabs
 
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018DevOpsGroup
 
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...Matthew Skelton
 
Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...
Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...
Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...TestingUy
 

Similar to How Game Theory and Observability Can Tear Down Silos and Improve DevOps (20)

Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 
Turning Human Capital into High Performance Organizational Capital
Turning Human Capital into High Performance Organizational CapitalTurning Human Capital into High Performance Organizational Capital
Turning Human Capital into High Performance Organizational Capital
 
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
DevOpsDays Houston 2019 -Kevin Crawley - Practical Guide to Not Building Anot...
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops final
 
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)
 
2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1a2011 09 19 LSPE Dev Ops Cookbook 1a
2011 09 19 LSPE Dev Ops Cookbook 1a
 
DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck DevOps Roadtrip Final Speaking Deck
DevOps Roadtrip Final Speaking Deck
 
DBA Role Shift in a DevOps World
DBA Role Shift in a DevOps WorldDBA Role Shift in a DevOps World
DBA Role Shift in a DevOps World
 
Agile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego GameAgile2015: Introduction to DevOps with Chocolate and Lego Game
Agile2015: Introduction to DevOps with Chocolate and Lego Game
 
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 20185 practical operability techniques - Matthew Skelton - SkillsMatter 2018
5 practical operability techniques - Matthew Skelton - SkillsMatter 2018
 
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
5 practical operability techniques for teams - Matthew Skelton - SQUID meetup...
 
Service Management in a DevOps World - by Helen Beal
Service Management in a DevOps World - by Helen BealService Management in a DevOps World - by Helen Beal
Service Management in a DevOps World - by Helen Beal
 
DevOps Transformation - Another View
DevOps Transformation - Another ViewDevOps Transformation - Another View
DevOps Transformation - Another View
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Delivering Better Software Faster (Without Breaking Everything)
Delivering Better Software Faster (Without Breaking Everything)Delivering Better Software Faster (Without Breaking Everything)
Delivering Better Software Faster (Without Breaking Everything)
 
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
 
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
Practical operability techniques - Matthew Skelton - Unicom DevOps Showcase N...
 
Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...
Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...
Keynote Evento TestingUY 2018 - The Art of Excellence Adding value as an IT p...
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

How Game Theory and Observability Can Tear Down Silos and Improve DevOps

  • 1. Changing the Game: How Game Theory can break down silos Kevin Crawley – Developer Relations // Instana Principle SRE Architect & Co-Owner // Single Twitter: @notsureifkevin
  • 2. ▫ Docker Captain ▫ Gitlab Hero ▫ DevOpsDays Nashville Organizer ▫ 20 years in software development ▫ 5+ years DevOps/SRE experience About Me
  • 3. Discussion Points ▪ How does Game Theory tear down Silos ▪ Characteristics of High Performance Organizations ▪ DevOps and Site Reliability Engineers ▪ What SREs need to be effective
  • 4. Let’s talk about Game Theory (Disclaimer: I’m bad at math) source: Nirmal Mehta (Docker Captain)
  • 5. What is Bad Equilibrium? It’s a strategy that all players in the game can adoptand converge on, butit won’tproduce a desirable outcome for anyone. https://pdfs.semanticscholar.org/30d1/a03db196384a17fed3247407fb5859f7c76b.pdf
  • 6. Transformation: Focusing on Automation https://devops-research.com/
  • 7. Where do silos come from? Silos can be defined as the contention which exist between functional units within an organization. This contention usually manifests between teams where change management policy requirements and risks are high.
  • 8. Nash Equilibrium (Prisoners Dilemma) A concept of game theory where the optimal outcome of a game is one where no player has an incentive to deviate from her chosen strategy after considering an opponents choice. https://en.wikipedia.org/wiki/Nash_equilibrium
  • 9. Split / Steal – Example 1 ▪ Video has been removed to save bandwidth, you may view it here on YouTube https://www.youtube.com/watch?v=p3Uos2fzIJ0
  • 11. Pareto Efficiency Is a state of allocation of resources in which it is impossible to make any one individual better off without making at least one individual worse off. … aka ZERO SUM https://en.wikipedia.org/wiki/Pareto_efficiency
  • 12.
  • 13. Pareto Inefficiency A situationis inefficient if someone canbe made better off even after compensating those made worse off.
  • 14. Pareto Inefficient Nash Equilibrium … is a Bad Equilibrium
  • 15. Split / Steal Example 2 ▪ Video has been removed to save bandwidth, you may view it here on YouTube https://www.youtube.com/watch?v=S0qjK3TWZE8
  • 19. Pareto Inefficient Nash Equilibrium Gives you permission and proof to change the game
  • 21. Percentage of Work Done Manually ELITE PERFORMERS HIGH PERFORMERS LOW PERFORMERS Configuration Management 5% 10% 30% Testing 10% 20% 30% Deployments 5% 10% 30% Change approval process 10% 30% 40% https://devops-research.com/
  • 22. High Performance vs Low Performance Organizations High Performers ▪ Deployments: > 1 hour and < 1 day ▪ Lead Time for Changes: > 1 day and < 1 week ▪ MTTR: < 1 day ▪ Change Failure Rate: 0-15% Low Performers ▪ Deployments: Once per week/month ▪ Lead Time for Changes: > 1 month and <6 months ▪ MTTR: > 1 week and < 1 month ▪ Change Failure Rate:https://devops-research.com/
  • 23. What happens when we tear down the silos and become a DevOps organization? ▪ We ship more software more often, complexity increases and reliability starts to decline ▪ We naturally shift our focus to solve the scalability and reliability issues (alternatively we give up and readopt the monolith) ▪ Rise of the Site Reliability Engineers
  • 24. Transformation: Focusing on Information https://devops-research.com/
  • 25. What are some tools / processes that organizations can put in place to change our equilibrium and communicate? ▪ Communication & Collaboration Tools ▫ Slack, Git, Pagerduty, OpsGenie ▪ Observability (SRE) Tooling ▫ Custom Dashboards / Metrics / Alerting ▫ Log Analytics ▫ Distributed Tracing
  • 26. What do SREs care about? ▪ Reliability (this one is obvious) ▪ Performance (is the customer happy?) ▪ Costs (is the business happy?) SREs are in the business of measurement and define objectives through SLOs by measuring SLIs.
  • 27. What do SREs typically measure ▪ Error Rates ▪ Latency ▪ Throughput ▪ Saturation “The Four Golden Signals” - https://landing.google.com/sre/sre- book/chapters/monitoring-distributed-systems/
  • 28. What is Observability? Kalman, 1961 paper On the general theory of control systems ▪ A system is observable if the behavior of the entire system can be determined by only looking at its inputs and outputs. ▪ Lesson: control theory is a well-documented approach which people can learn from vs trying to reinvent
  • 29. Can we get some pillars? The 4 pillars of Observability was originally described in a blog article from Twitter: ▪ Monitoring ▪ Log Aggregation / Analytics ▪ Distributed systems tracing infrastructure ▪ Alerting / Visualization https://blog.twitter.com/engineering/en_us/a/2016/observability-at-twitter-technical- overview-part-i.html
  • 30. More than just pillars… “While plainly having access to logs, metrics, and traces doesn’t necessarily make systems more observable, these are powerful tools that, if understood well, can unlock the ability to build better systems.” - Cindy Sridharen https://www.oreilly.com/library/view/distributed-systems-observability/9781492033431/
  • 31. Observability gives us the means to understand all of the behavior in our systems ▪ Not just tooling, it’s how we model and analyze data ▪ Similar to how DevOps is a mindset / culture ▪ No longer treating services like Schrödinger's cat ▪ (A lot) more context around events and transactions https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
  • 32. Why does my organization need any of this? This sounds like a lot of work …
  • 33. How many of you are running staging environments?
  • 34. How many of you actually trust your staging environments?
  • 35.
  • 36. In order to observe a system, we must emit signals and analyze the aggregates. Those aggregates can answer the following questions (and more): ▪ Number of Reqs / Retries / Backoffs (throughput) ▪ Request parameters / Query Statements (details) ▪ Latency / Outliers (performance) ▪ Top-Level Exceptions / Log Messages (error analysis)
  • 37. How can we collect this data? Distributed Tracing ▪ Also known as Distributed Structured Logging ▪ Larger Payloads ▪ Rich Contextual Data https://w3c.github.io/trace-context
  • 38. Sampling vs. No Sample ▪ Sampling traces may result in important outliers (P95/P99) to be missed ▪ Extremely high volume systems must sample due to massive overhead ▪ Start without sampling, adopt as needed, incorporate solutions which sample adaptively
  • 39. How has Observability helped enable a DevOps culture? Let’s take a look at a production microservice application which has been instrumented by a distributed tracing solution
  • 40. ▪ Operated by 3 engineers (1 FE/1 BE/1 SRE) ▪ Over 20k transaction / hour, 20+ integrations, 150k LOC, with less than 15% test coverage ▪ Launched in 2018 with 15 microserviceson DockerSwarm – has since expanded to over35 microserviceswith zero additional engineering personnel ▪ One-touch deployment and provisioningfor newand existing services
  • 42.
  • 43.
  • 44. Analyzing Distributed Trace Aggregates What happens if we aggregate timing, error rate, and # of reqs for each endpoint on a service
  • 45.
  • 46.
  • 47. What problems have Distributed Tracing helped solve? Database Optimizations, Caching, and Concurrency
  • 49. Slow Death of a Service
  • 50. Rise in Latency + Processing Time ▪ DBO (Hibernate Query) causing O(n log n) rise in latency and processing time ▪ Application Dashboard indicated an issue with overall latency increasing ▪ Fix deployed and improvement was observed immediately
  • 52.
  • 53. Caching Solved one problem … but caused another ▪ We implemented Redis for caching, and processing time went down ▪ However, we didn’t account for token policies changing and they suddenly began to expire after 30 seconds ▪ Alerting around error rates for this endpoint raised our awareness around this issue
  • 54.
  • 55.
  • 56.
  • 57. Context is critical Metrics are not standalone, they have relationships
  • 58.
  • 59.
  • 60.
  • 61. Custom Dashboards We utilize a mix of Instana, Logz.io and Grafana to manage our systems
  • 62.
  • 63.
  • 64. Focusing on Observability ▪ Enables your organization to understand the behavior of your system ▪ Empowers your engineers to find and fix problems ▪ Enables you to build more reliable systems and ship software faster ▪ Promotes empathy through understanding, transparency, and communication.
  • 65. Want to learn more about monitoring production microservice apps? ▪ Follow me on twitter for upcoming workshops @notsureifkevin & @InstanaHQ ▪ Get a free trial of Instana @ https://instana.com

Editor's Notes

  1. My name is Kevin. I’ve been using Docker and maintaining distributed application systems in production since 2014. I help organize events in my local area and speak on topics such as devops, automation, culture, and observability.
  2. This is what happens when orgs try to: Speed up delivery Reduce MTTR Reduce lead times
  3. We all understand the game, but we don’t know how to change the rules to gain an advantage
  4. This is what happens when orgs try to: Speed up delivery Reduce MTTR Reduce lead times
  5. Time-sharing computers Computer guided missles Air Defense Network goes online
  6. 2. computational complexity and bandwidth requirements of distributed tracing (Lyft, Netflix, Google, etc) 3. These solutions work around inefficient consumers and processing systems (they’re typically not stream based) 4. Unless of course you’re trying to do this yourself, in which case the complexity of running these systems is extremely high, the other condition is you truly are a behemoth, in which case you probably already know most of this stuff already
  7. Over 150 containers Spread across multiple hosts / azs Two separate environments
  8. High level overview of all the services in production
  9. Single music has over 30 services in production, we can’t possibly monitor 30 dashboards at a time … or can we?