SlideShare a Scribd company logo
The Virtuous Cycle
Getting Good Things Out of Bad Failures
Joy Scharmen
I’m here to talk about failure.
Why talk about failure?
Failure is amazing.
Failure is our best teacher.
How do we learn from failure?
Have you ever run
out of integers
in an auto-
incrementing primary
key column in a
database?
Looking at you, ActiveRecord.
Frameworks that default to INT
Assumptions about the size of
your database.
(before it hits production)
Just not thinking about it
I’m just happy that it works at all.
Oops, I did it again.
We had this happen to us twice.
...then it happened a third time.
And we’re an operations
company.
┬─┬ノ( ゜-゜ノ)
First, consider it more deeply.
Σ(-᷅_-᷄๑)
We fixed one occurrence. It was
simple.
It happened again. Same fix.
Then it happened again.
Obviously what we are doing here isn’t working.
Who here is familiar with
retrospectives?
ret·ro·spec·tive
ˌretrəˈspektiv/
adjective
1. looking back on or dealing with past events or situations.
Who has been to a boring
retrospective?
I have. I’ve run them. Sorry.
A retrospective is the pivot point
between failure and learning.
If it’s boring, no one is learning.
How do we have non-boring
retrospectives?
Create engagement. Prepare!
Don’t force people to watch the sausage being
made.
Before the retrospective:
Choose a facilitator
They should know who was involved and
why.
Before the retrospective:
Build a timeline
Gather your facts.
Use your tools wisely
“We become what we behold. We shape our
tools, and thereafter our tools shape us.”
― Marshall McLuhan
Retrospective PrepIncident Management
ChatOps
Bot Tools
SitReps
Time
Outreach
Organization
My Tools For
My personal incident management tool belt:
ChatOps
My personal incident management tool belt:
Bot Tools
My personal incident management tool belt:
SitReps
My personal retrospective toolbelt:
Time
Block out time.
My personal retrospective toolbelt:
Outreach
Have roles defined.
My personal retrospective toolbelt:
Organization
Send out the agenda, including the timeline, the
day before the retrospective.
People should show up to a
retrospective with context to
begin a discussion.
Everyone is in the
retrospective. The timeline
is done. How do we start?
Have the most involved
engineer give a brief
summary of what
happened.
Make sure everyone is engaged.
Read the room.
Be compassionate for your customers.
Talk about customer
impact.
Take note.
Pick the point you want to
start from and dive in.
If you ever get to “human
error”, keep digging.
No, really.
If you ever get to “human
error”, keep digging.
Most Important:
Always Assume Good
Intent
Defensiveness kills
retrospection.
One way you can tell a
retrospective is good:
you have a ridiculous list of
remediation items.
“re-architect the whole platform”
Remediations can be anything from:
“fix typo on line 5”.
“make the speed of light go faster”.
to
to
Don’t do every remediation.
Don’t discount big projects!
and
What do you do with all of
these remediations?
Bring them to product as well as engineering!
Product can be your best
friend.
Do you have a need? Your customers do
too.
Product is great at getting needs in front of
customers.
Heroku Pipelines
Pipelines is a product that came out of an
engineering need.
Is your fix a small thing you
can add to existing customer
tools?
Engineering should be able to do this with
minimal product sign off.
You can improve your
customers’ experience.
Your customers, your fellow engineers, and
your community can benefit from your own
needs and hard won experience.
Back to the story.
Done: Tooling
Done: Process
Next: Automation
Next: Fix inputs
* https://github.com/rails/rails/pull/24962
Every failure is a
chance to learn.
Make those chances count.
Thank you.
Joy Scharmen / @peculiaire / joy@heroku.com
Retrospective Resource Wiki:
http://retrospectivewiki.org
https://www.oreilly.com/ideas/the-infinite-hows
Infinite Hows:
https://devcenter.heroku.com
Heroku Dev Center:
https://github.com/peculiaire/incident-lifecycle/blob/master/retrotemplate.md
Retrospective Template:

More Related Content

What's hot

Designing work
Designing workDesigning work
Designing work
farzanashoma
 
Maybe We Don’t Have to Test It
Maybe We Don’t Have to Test ItMaybe We Don’t Have to Test It
Maybe We Don’t Have to Test It
TechWell
 
How to Run 100 User Tests in Two Days
How to Run 100 User Tests in Two DaysHow to Run 100 User Tests in Two Days
How to Run 100 User Tests in Two Days
Daniel Sauble
 
Obstacles of Digital Transformation Evolution
Obstacles of Digital Transformation EvolutionObstacles of Digital Transformation Evolution
Obstacles of Digital Transformation Evolution
Equal Experts
 
STARWest Workshop: Explore with Intent
STARWest Workshop: Explore with IntentSTARWest Workshop: Explore with Intent
STARWest Workshop: Explore with Intent
Maaret Pyhäjärvi
 
Product Development -The Great Unknown
Product Development -The Great UnknownProduct Development -The Great Unknown
Product Development -The Great Unknown
Steve Owens
 
Stop the line @spotify
Stop the line @spotifyStop the line @spotify
Stop the line @spotify
Peter Antman
 
Problem Solving
Problem SolvingProblem Solving
Problem Solving
nroggen
 
STARWest: Make Your Team Awesome, Yes You Can!
STARWest: Make Your Team Awesome, Yes You Can!STARWest: Make Your Team Awesome, Yes You Can!
STARWest: Make Your Team Awesome, Yes You Can!
Maaret Pyhäjärvi
 
Matt Heusser - Keynote - Cool New Things... and some old ones too
Matt Heusser - Keynote - Cool New Things... and some old ones tooMatt Heusser - Keynote - Cool New Things... and some old ones too
Matt Heusser - Keynote - Cool New Things... and some old ones tooQA or the Highway
 
Data Integrity - Patryk Hes
Data Integrity - Patryk HesData Integrity - Patryk Hes
Data Integrity - Patryk Hes
PROIDEA
 
Nightmare on PMO Street
Nightmare on PMO StreetNightmare on PMO Street
Nightmare on PMO Street
KeyedIn Projects
 
SEETest: Making Teams Awesome
SEETest: Making Teams AwesomeSEETest: Making Teams Awesome
SEETest: Making Teams Awesome
Maaret Pyhäjärvi
 
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at ScaleHUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
Maaret Pyhäjärvi
 
Why unvalidated assumption is the enemy of good product
Why unvalidated assumption is the enemy of good productWhy unvalidated assumption is the enemy of good product
Why unvalidated assumption is the enemy of good product
Seb Agertoft
 
How to continuosly gain user insights during an agile project
How to continuosly gain user insights during an agile projectHow to continuosly gain user insights during an agile project
How to continuosly gain user insights during an agile project
Anders Ballde Jacobsson
 
Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...
Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...
Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...
Sara Snyder
 

What's hot (18)

Designing work
Designing workDesigning work
Designing work
 
Maybe We Don’t Have to Test It
Maybe We Don’t Have to Test ItMaybe We Don’t Have to Test It
Maybe We Don’t Have to Test It
 
How to Run 100 User Tests in Two Days
How to Run 100 User Tests in Two DaysHow to Run 100 User Tests in Two Days
How to Run 100 User Tests in Two Days
 
Obstacles of Digital Transformation Evolution
Obstacles of Digital Transformation EvolutionObstacles of Digital Transformation Evolution
Obstacles of Digital Transformation Evolution
 
STARWest Workshop: Explore with Intent
STARWest Workshop: Explore with IntentSTARWest Workshop: Explore with Intent
STARWest Workshop: Explore with Intent
 
Product Development -The Great Unknown
Product Development -The Great UnknownProduct Development -The Great Unknown
Product Development -The Great Unknown
 
Stop the line @spotify
Stop the line @spotifyStop the line @spotify
Stop the line @spotify
 
Problem Solving
Problem SolvingProblem Solving
Problem Solving
 
STARWest: Make Your Team Awesome, Yes You Can!
STARWest: Make Your Team Awesome, Yes You Can!STARWest: Make Your Team Awesome, Yes You Can!
STARWest: Make Your Team Awesome, Yes You Can!
 
Matt Heusser - Keynote - Cool New Things... and some old ones too
Matt Heusser - Keynote - Cool New Things... and some old ones tooMatt Heusser - Keynote - Cool New Things... and some old ones too
Matt Heusser - Keynote - Cool New Things... and some old ones too
 
Data Integrity - Patryk Hes
Data Integrity - Patryk HesData Integrity - Patryk Hes
Data Integrity - Patryk Hes
 
Nightmare on PMO Street
Nightmare on PMO StreetNightmare on PMO Street
Nightmare on PMO Street
 
SEETest: Making Teams Awesome
SEETest: Making Teams AwesomeSEETest: Making Teams Awesome
SEETest: Making Teams Awesome
 
Ooda pres
Ooda presOoda pres
Ooda pres
 
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at ScaleHUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
HUSTEF '21 Keynote: Hands Off Exploratory Testing - Managing at Scale
 
Why unvalidated assumption is the enemy of good product
Why unvalidated assumption is the enemy of good productWhy unvalidated assumption is the enemy of good product
Why unvalidated assumption is the enemy of good product
 
How to continuosly gain user insights during an agile project
How to continuosly gain user insights during an agile projectHow to continuosly gain user insights during an agile project
How to continuosly gain user insights during an agile project
 
Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...
Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...
Guerilla Usability Testing, or How I learned that perfectly imperfect tests a...
 

Viewers also liked

Mark Leslie - Leadership and The Virtuous Cycle
Mark Leslie - Leadership and The Virtuous CycleMark Leslie - Leadership and The Virtuous Cycle
Mark Leslie - Leadership and The Virtuous CycleMark Leslie
 
Microsoft Trusted Cloud - Security Privacy & Control, Compliance, Transparency
Microsoft Trusted Cloud - Security Privacy & Control, Compliance, TransparencyMicrosoft Trusted Cloud - Security Privacy & Control, Compliance, Transparency
Microsoft Trusted Cloud - Security Privacy & Control, Compliance, Transparency
Microsoft Österreich
 
Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...
Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...
Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...
Microsoft Österreich
 
Webinar - Top 5 Strategies for Digital Process Agility
Webinar - Top 5 Strategies for Digital Process AgilityWebinar - Top 5 Strategies for Digital Process Agility
Webinar - Top 5 Strategies for Digital Process Agility
Bizagi
 
Empired Snap: Intranets are Changing
Empired Snap: Intranets are ChangingEmpired Snap: Intranets are Changing
Empired Snap: Intranets are Changing
Empired
 
Digital Transformation How to Reboot IT and Business Collaboration
Digital Transformation   How to Reboot IT and Business CollaborationDigital Transformation   How to Reboot IT and Business Collaboration
Digital Transformation How to Reboot IT and Business Collaboration
Bizagi
 
Microsoft Dynamics 365 and why you need it NOW!
Microsoft Dynamics 365 and why you need it NOW!Microsoft Dynamics 365 and why you need it NOW!
Microsoft Dynamics 365 and why you need it NOW!
David Blumentals
 
Dynamics Day 2016 - Microsoft Dynamics 365 sales and customer service (CRM) ...
Dynamics Day 2016  - Microsoft Dynamics 365 sales and customer service (CRM) ...Dynamics Day 2016  - Microsoft Dynamics 365 sales and customer service (CRM) ...
Dynamics Day 2016 - Microsoft Dynamics 365 sales and customer service (CRM) ...
Empired
 
Digital Workspace
Digital WorkspaceDigital Workspace
Digital Workspace
BearingPoint
 
Payment Factory
Payment FactoryPayment Factory
Payment Factory
BearingPoint
 
Dynamics Day 2016 - Digital transformation with Microsoft Dynamics 365
Dynamics Day 2016  - Digital transformation with Microsoft Dynamics 365Dynamics Day 2016  - Digital transformation with Microsoft Dynamics 365
Dynamics Day 2016 - Digital transformation with Microsoft Dynamics 365
Empired
 
Dynamics Day 2016 - Microsoft Dynamics 365 the future of Dynamics
Dynamics Day 2016  - Microsoft Dynamics 365 the future of DynamicsDynamics Day 2016  - Microsoft Dynamics 365 the future of Dynamics
Dynamics Day 2016 - Microsoft Dynamics 365 the future of Dynamics
Empired
 
The essential elements of a digital transformation strategy
The essential elements of a digital transformation strategyThe essential elements of a digital transformation strategy
The essential elements of a digital transformation strategy
Marcel Santilli
 
Why Digital Transformation is not an IT Transformation
Why Digital Transformation is not an IT Transformation Why Digital Transformation is not an IT Transformation
Why Digital Transformation is not an IT Transformation
Vishal Sharma
 
Digital Transformation - How to Deliver Meaningful Results
Digital Transformation - How to Deliver Meaningful ResultsDigital Transformation - How to Deliver Meaningful Results
Digital Transformation - How to Deliver Meaningful Results
Bizagi
 
Digital Transformation and the Customer Experience
Digital Transformation and the Customer ExperienceDigital Transformation and the Customer Experience
Digital Transformation and the Customer Experience
Mat Ford
 
Microsoft Dynamics CRM 2015 Pre-sales Presentation Material
Microsoft Dynamics CRM 2015 Pre-sales Presentation MaterialMicrosoft Dynamics CRM 2015 Pre-sales Presentation Material
Microsoft Dynamics CRM 2015 Pre-sales Presentation Material
Aileen Gusni
 
Developing a Roadmap for Digital Transformation
Developing a Roadmap for Digital TransformationDeveloping a Roadmap for Digital Transformation
Developing a Roadmap for Digital Transformation
John Sinke
 

Viewers also liked (18)

Mark Leslie - Leadership and The Virtuous Cycle
Mark Leslie - Leadership and The Virtuous CycleMark Leslie - Leadership and The Virtuous Cycle
Mark Leslie - Leadership and The Virtuous Cycle
 
Microsoft Trusted Cloud - Security Privacy & Control, Compliance, Transparency
Microsoft Trusted Cloud - Security Privacy & Control, Compliance, TransparencyMicrosoft Trusted Cloud - Security Privacy & Control, Compliance, Transparency
Microsoft Trusted Cloud - Security Privacy & Control, Compliance, Transparency
 
Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...
Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...
Enable Mobility and Improve Cost Efficiency within a Secure Ecosystem - Futur...
 
Webinar - Top 5 Strategies for Digital Process Agility
Webinar - Top 5 Strategies for Digital Process AgilityWebinar - Top 5 Strategies for Digital Process Agility
Webinar - Top 5 Strategies for Digital Process Agility
 
Empired Snap: Intranets are Changing
Empired Snap: Intranets are ChangingEmpired Snap: Intranets are Changing
Empired Snap: Intranets are Changing
 
Digital Transformation How to Reboot IT and Business Collaboration
Digital Transformation   How to Reboot IT and Business CollaborationDigital Transformation   How to Reboot IT and Business Collaboration
Digital Transformation How to Reboot IT and Business Collaboration
 
Microsoft Dynamics 365 and why you need it NOW!
Microsoft Dynamics 365 and why you need it NOW!Microsoft Dynamics 365 and why you need it NOW!
Microsoft Dynamics 365 and why you need it NOW!
 
Dynamics Day 2016 - Microsoft Dynamics 365 sales and customer service (CRM) ...
Dynamics Day 2016  - Microsoft Dynamics 365 sales and customer service (CRM) ...Dynamics Day 2016  - Microsoft Dynamics 365 sales and customer service (CRM) ...
Dynamics Day 2016 - Microsoft Dynamics 365 sales and customer service (CRM) ...
 
Digital Workspace
Digital WorkspaceDigital Workspace
Digital Workspace
 
Payment Factory
Payment FactoryPayment Factory
Payment Factory
 
Dynamics Day 2016 - Digital transformation with Microsoft Dynamics 365
Dynamics Day 2016  - Digital transformation with Microsoft Dynamics 365Dynamics Day 2016  - Digital transformation with Microsoft Dynamics 365
Dynamics Day 2016 - Digital transformation with Microsoft Dynamics 365
 
Dynamics Day 2016 - Microsoft Dynamics 365 the future of Dynamics
Dynamics Day 2016  - Microsoft Dynamics 365 the future of DynamicsDynamics Day 2016  - Microsoft Dynamics 365 the future of Dynamics
Dynamics Day 2016 - Microsoft Dynamics 365 the future of Dynamics
 
The essential elements of a digital transformation strategy
The essential elements of a digital transformation strategyThe essential elements of a digital transformation strategy
The essential elements of a digital transformation strategy
 
Why Digital Transformation is not an IT Transformation
Why Digital Transformation is not an IT Transformation Why Digital Transformation is not an IT Transformation
Why Digital Transformation is not an IT Transformation
 
Digital Transformation - How to Deliver Meaningful Results
Digital Transformation - How to Deliver Meaningful ResultsDigital Transformation - How to Deliver Meaningful Results
Digital Transformation - How to Deliver Meaningful Results
 
Digital Transformation and the Customer Experience
Digital Transformation and the Customer ExperienceDigital Transformation and the Customer Experience
Digital Transformation and the Customer Experience
 
Microsoft Dynamics CRM 2015 Pre-sales Presentation Material
Microsoft Dynamics CRM 2015 Pre-sales Presentation MaterialMicrosoft Dynamics CRM 2015 Pre-sales Presentation Material
Microsoft Dynamics CRM 2015 Pre-sales Presentation Material
 
Developing a Roadmap for Digital Transformation
Developing a Roadmap for Digital TransformationDeveloping a Roadmap for Digital Transformation
Developing a Roadmap for Digital Transformation
 

Similar to Joy Scharmen - The Virtuous Cycle: Getting Good Things Out of Bad Failures

Five Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our UsersFive Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our Users
Sajid Reshamwala
 
Get things done : pragmatic project management
Get things done : pragmatic project managementGet things done : pragmatic project management
Get things done : pragmatic project management
Stan Carrico
 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring Technology
Dan McKinley
 
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklWait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
PROIDEA
 
Blameless system design - annotated
Blameless system design  - annotatedBlameless system design  - annotated
Blameless system design - annotated
Douglas Land
 
Toyota business practices
Toyota business practicesToyota business practices
Toyota business practices
ssuser727fc31
 
Grails Worst Practices
Grails Worst PracticesGrails Worst Practices
Grails Worst Practices
Burt Beckwith
 
The alignment
The alignmentThe alignment
The alignment
Alberto Brandolini
 
C programming guide new
C programming guide newC programming guide new
C programming guide new
Kuntal Bhowmick
 
Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)
Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)
Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)
Claudio Perrone
 
Agent of Change
Agent of ChangeAgent of Change
Agent of Change
mfrost503
 
“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY
“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY
“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY
LizzyManz
 
Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...
Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...
Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...
GeorgeGozon1
 
2016 letter to Amazon shareholders
2016 letter to Amazon shareholders2016 letter to Amazon shareholders
2016 letter to Amazon shareholders
Matt Oh
 
Jeff Bezos' 2016 Letter to Amazon Shareholders
Jeff Bezos' 2016 Letter to Amazon ShareholdersJeff Bezos' 2016 Letter to Amazon Shareholders
Jeff Bezos' 2016 Letter to Amazon Shareholders
Razin Mustafiz
 
Amazon Jeff Bezos 2016 letter to shareholders
Amazon Jeff Bezos 2016 letter to shareholdersAmazon Jeff Bezos 2016 letter to shareholders
Amazon Jeff Bezos 2016 letter to shareholders
Laurie Ruettimann
 
Impactanalysis 150507054758-lva1-app6891
Impactanalysis 150507054758-lva1-app6891Impactanalysis 150507054758-lva1-app6891
Impactanalysis 150507054758-lva1-app6891
Jose P. Banuelos
 
Impact Analysis - LoopConf
Impact Analysis - LoopConfImpact Analysis - LoopConf
Impact Analysis - LoopConf
Chris Lema
 
The-Small Book-of-The-Few-Big-Rules-OutSystems
The-Small Book-of-The-Few-Big-Rules-OutSystemsThe-Small Book-of-The-Few-Big-Rules-OutSystems
The-Small Book-of-The-Few-Big-Rules-OutSystemsSteve Rotter
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Will Gallego
 

Similar to Joy Scharmen - The Virtuous Cycle: Getting Good Things Out of Bad Failures (20)

Five Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our UsersFive Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our Users
 
Get things done : pragmatic project management
Get things done : pragmatic project managementGet things done : pragmatic project management
Get things done : pragmatic project management
 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring Technology
 
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman PicklWait A Moment? How High Workload Kills Efficiency! - Roman Pickl
Wait A Moment? How High Workload Kills Efficiency! - Roman Pickl
 
Blameless system design - annotated
Blameless system design  - annotatedBlameless system design  - annotated
Blameless system design - annotated
 
Toyota business practices
Toyota business practicesToyota business practices
Toyota business practices
 
Grails Worst Practices
Grails Worst PracticesGrails Worst Practices
Grails Worst Practices
 
The alignment
The alignmentThe alignment
The alignment
 
C programming guide new
C programming guide newC programming guide new
C programming guide new
 
Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)
Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)
Evolve or Die: A3 Thinking and Popcorn Flow in Action (#LKCE14)
 
Agent of Change
Agent of ChangeAgent of Change
Agent of Change
 
“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY
“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY
“Don’t Repeat Yourself”: 4 Process Street Features to Keep Work DRY
 
Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...
Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...
Module 3.1 PowerPoint Slide Deck - DOWNLOAD for Presentation version April 20...
 
2016 letter to Amazon shareholders
2016 letter to Amazon shareholders2016 letter to Amazon shareholders
2016 letter to Amazon shareholders
 
Jeff Bezos' 2016 Letter to Amazon Shareholders
Jeff Bezos' 2016 Letter to Amazon ShareholdersJeff Bezos' 2016 Letter to Amazon Shareholders
Jeff Bezos' 2016 Letter to Amazon Shareholders
 
Amazon Jeff Bezos 2016 letter to shareholders
Amazon Jeff Bezos 2016 letter to shareholdersAmazon Jeff Bezos 2016 letter to shareholders
Amazon Jeff Bezos 2016 letter to shareholders
 
Impactanalysis 150507054758-lva1-app6891
Impactanalysis 150507054758-lva1-app6891Impactanalysis 150507054758-lva1-app6891
Impactanalysis 150507054758-lva1-app6891
 
Impact Analysis - LoopConf
Impact Analysis - LoopConfImpact Analysis - LoopConf
Impact Analysis - LoopConf
 
The-Small Book-of-The-Few-Big-Rules-OutSystems
The-Small Book-of-The-Few-Big-Rules-OutSystemsThe-Small Book-of-The-Few-Big-Rules-OutSystems
The-Small Book-of-The-Few-Big-Rules-OutSystems
 
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose TutorialArchitecting a Post Mortem - Velocity 2018 San Jose Tutorial
Architecting a Post Mortem - Velocity 2018 San Jose Tutorial
 

Recently uploaded

Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 

Recently uploaded (20)

Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 

Joy Scharmen - The Virtuous Cycle: Getting Good Things Out of Bad Failures

Editor's Notes

  1. Hi, I’m Joy and I’m the SRE director at Heroku. For those of you who aren’t familiar with Heroku, we’re a Platform as a Service. This means we handle a lot of the operations work for the customers who run on our platform. My job is to keep our platforms maximally stable so our customers can sleep easy at night.
  2. I'm here to talk about failure and why I love it, or at least don’t hate it.
  3. Why would I want to talk about failure? Failure is amazing — it can be our best teacher. As an SRE failure is utterly crucial to me doing my job. Complex systems often fail and we learn so much more from their failure than from success. A lot of us have probably had this realization. If we didn’t have failure, we’d be out of a job.
  4. So the question today is how do we learn from that failure? How do we learn from that failure in a way that doesn’t make us feel like failures? Let's start with an SRE war story — everyone loves a good war story.
  5. How many of us ever run out of integers in an auto-incrementing primary key column in a database? The whole database halts because it just ran out of numbers. And it’s usually a critical database. I've seen this failure mode pretty much everywhere I've ever worked as an SRE.
  6. It's pretty embarrassing because seriously -- you just ran out of numbers. It seems really easy to fix but it just keeps cropping up. So what are some of the reasons that this keeps happening?
  7. Commonly used frameworks have defaults that can come back and bite you later.
  8. Assumptions about the size of your database before it hits production. It’s a good problem to have when you’re successful enough that you outgrew your original assumptions. Two billion — that's a lot of numbers.
  9. Or just not thinking about it at all! That's probably the most common reason.
  10. So we had this happen to us twice in two months. That was pretty bad. Then it happened a third time almost a year later. For me as the head of SRE seeing this again was pretty painful!
  11. We run a Platform as a Service! Our whole premise is doing operations for our customers so they don’t have to. So how do we fix this problem for real?
  12. First we have to consider it more deeply than we did at the start. If the obvious fix was the long-term fix it wouldn't keep coming up.
  13. It’s simple enough to fix one occurrence of this, just change it to BIGINT. Data starts flowing again, folks go back to business as usual.
  14. When this happened the second time we applied a similar fix, and we also poked around manually at other crucial DBs that might have this problem. We even caught a few before failure that way.
  15. We needed to fix this a lot more systemically. Fortunately there’s a good tool for that!
  16. So who here is familiar with retrospectives? I imagine most people here have been to or at least know about them as a place to reflect on past projects or incidents. One of the main things that SRE instituted at Heroku were retrospectives for all customer affecting incidents.
  17. If you have been to a retrospective, you probably have been to a boring retrospective. I know I’ve run boring retrospectives. Sorry. I used to think that if you just got the right people in a room together to chat over an incident, things would naturally happen and we’d have a great, engaging conversation and leave with an amazing solution that would fix our problems. Maybe it would also solve world hunger. In reality, when you pop a 1 hour meeting on a bunch of folks’ calendars about an outage with no context, this stuff happens: Some folks don’t show, because they are allergic to calendars, email, and meetings. The ones that do show might be there because they have an axe to grind, or because they feel like they have to defend themselves. Establishing the timeline in the meeting leads to bickering and “well, actually” statements that put everyone who wasn’t in a bad mood into a bad mood. Once everyone is sufficiently miserable, you’re most of the way through your time. You have about 5 minutes to give people some work to do as the cherry on top of the misery sundae. If that doesn’t happen, everyone is bored and tuned out. The engineers are all doing email. The facilitator is doing email. No one’s paying any attention. At the end of the meeting you have some cursory remediation items and if you are lucky some might actually get done.
  18. For a retrospective to be useful, it can’t be boring. A retrospective is the pivot point between failure and learning. If it’s boring, no one is learning and you might as well give everyone back the time in their day they were sitting in the meeting. Putting a bunch of highly-paid engineers in a meeting for an hour in which they don’t learn anything is a waste of time, money, and morale.
  19. One problem we had with the first INT rollover is that we didn’t have a retrospective, because folks thought that they were a waste of time for something so trivial and easily understood. They were trying to avoid a boring time consuming meeting without a clear sense of what value it would have. This makes sense. I avoid boring meetings too. In this case, the problem was deceptive. Had we dug into it the first or even the second time we would have been able to discover that. So how do you have non-boring, useful retrospectives?
  20. One way to create engagement during the retrospective is by preparing for the meeting. Don’t force people to watch the sausage being made. It is excruciating for someone to attend a meeting and then have to figure out the timeline, or to find that you don’t have the right people, or even that you have the wrong people and not the right people. Retrospectives are a big time commitment we expect people to make and we need to make them count. People should know that when they show up to a retrospective that they're actually going to get something good out of it.
  21. The facilitator is the most crucial role in this meeting. The person should familiarize themself with the facts of the incident -- so ideally they are someone who is adjacent to the incident but not a primary responder, because they're going to be talking a lot in the meeting, and they shouldn’t be asking questions of themselves. The facilitator should know who was involved and why they were involved in the incident.
  22. You should also build a timeline. This can be done by the facilitator while they're gathering all the facts for the retrospective. This is really important. When I say build a timeline, I don't mean have everything down to the second of precision and every little tiny detail. It should be an overview. Think of it as a narrative - how would you tell the story of this event? If you were telling a story, you would have a beginning, middle, and end. You’d cover salient points. And you probably wouldn’t be going for microsecond precision.
  23. Any good engineer needs their tools. When I talk about tools, I don’t just mean stuff that you can check into a repo. I mean mental tools as well.
  24. Here’s an overview of the tools I most commonly use to create engaging retrospectives. There’s nothing magical about any of these -- you can use them too. I’ll take you through them.
  25. Why chat? Audio transcriptions are error-prone and time consuming. We run all our incidents, and indeed our day to day communications, in chat. That means everything has a transcript that you can refer back to. People can communicate in parallel -- you don't have to worry about interrupting someone on the voice bridge, and you don’t need someone to transcribe what’s happening on a voice bridge. You can copy and paste commands as needed. I don’t care which type of chat you use, as long as you use chat.
  26. Bot tools include incident management tools built on top of our chat bots. One example is here, where we recorded something for the timeline of this incident. We deploy in chat, and deploys emit chat notifications. Pages alert in chat. We also have incident-management specific tools we wrote that can create notes for building a timeline or questions to follow up on while the incident is ongoing. This makes the gathering information process for the retrospective much easier. It’s also great for transparency and discoverability amongst our engineers.
  27. SitReps (or situation reports) are a common pattern in incident response anywhere. You just want to periodic summary of the situation. This isn't what you're telling to customers -- this is what you're telling to people internally. You can of course use jargon, you can use acronyms, and you can you don't need to polish it the same level as you would customer-facing communications. The goal is to make sure that responders have check points to guide themselves with as they work on the incident, especially as new folks come in. These are also very helpful when you try to understand what happened after an incident -- sitreps give you milestones of what happened and when.
  28. People underestimate the amount of time it takes to run a good retrospective. I'm not just talking about the time that it takes in the meeting. Prior preparation generally shortens the amount of time you all have to spend in a room together. Block out time for yourself to prepare at least one day before the retro is scheduled.
  29. Make sure all key players (including the incident coordinator and the communications people) are available and plan on attending the meeting. If someone crucial can’t attend, either reschedule or have someone who can speak for them show up instead (such as a team member). Make sure you have a note-taker, someone who isn’t a primary responder so they won’t have to talk and take notes simultaneously.
  30. In general, be organized. Send out the agenda, including the timeline, the day before. Make sure the room is booked ahead of time and A/V is working.
  31. When everyone shows up with context retrospectives can get to the interesting bits faster. Who doesn’t love dissecting a failure in a complex system? I love doing this and I know a lot of us do, because that’s why we’re in SRE.
  32. So everyone is in the retrospective and the timeline is done. How do we start? We set context, we keep it short, and we don't do the litany of timeline reading. Think of telling a story.
  33. Have the most involved engineer give a brief summary of what happened. They should stick to the facts and really take less than five minutes. The goal is to make sure that everyone really orients themselves to What happened. One thing I should say is that a retrospective should happen within a week of the incident. People should still have this relatively fresh in their mind by the time you go to retrospect. Otherwise you're wasting people's’ time, and you missed a chance to strike while the iron is hot and folks are feeling motivated to tackle remediations.
  34. Once you are actually in the meeting you're going to want to read the room. As a facilitator you need to make sure that everyone is engaged. You yourself need to be very present and active part of leading the discussion. Don't be the note taker -- make sure someone else is the note taker. You'll need to ask questions of everyone, especially the quiet folks. Some people will want to dominate the conversation and some people will never want to jump in but that quiet person probably has some really good insights.
  35. You should talk about customer impact! We should be compassionate for what your customers felt during the outage. It's not just that you woke up at 3 AM because your database ran out of numbers -- your customer who might be running a business on your platform and maybe is around the world could have lost some valuable business, or some important work, and we need to be aware of that disruption.
  36. Take note of interesting questions, statements, and points of confusion. This gives you jumping off points for deeper conversations. When we’ve established context we can start diving into these things.
  37. Once you have some starting points to start your questioning, dive in. There are various methods you can use to formulate questions for investigation. A lot of people like the 5 whys -- I think that it’s interesting (it was created at Toyota) and very logical for engineers to grasp, but I like more flexible methods. I really like John Allspaw’s Infinite Hows. Asking “why” can frame the conversation in a more blameful way than asking “how”. I don’t think this needs to be prescriptive, though. Simply don’t stop asking questions until you have gotten many layers deep.
  38. Really really important -- if you ever get to human error, keep digging. Your systems are created and operated by humans for humans. Human error is a constant.
  39. I cannot emphasize this enough! You have to work around and with human error. Have you ever heard the phrase “Linux is user-friendly, it's just picky about its friends”? I disagree. Linux is dangerous. Complex and powerful tools can be dangerous. If you can take out your system with a typo your systems are too fragile, because someone is going to make a typo.
  40. If someone skips a step or makes a typo due to exhaustion or in attention, that’s not on the engineer. Always assume good intent. Humans get tired, humans get burnt out, humans get distracted. And humans run your systems. When we build and maintain complex systems we have to develop interfaces for them that are as tolerant as possible to human frailty. The bonus here is that we like working with systems like this. Less friction and stress over using your tools means happier engineers, and happy engineers mean better work. Usable, beautiful tools are an investment in scaling and reliability.
  41. A reason to be very careful about respecting human failings is that we don't want to make people feel defensive. When someone feels that they have to defend themselves, they throw up shields. After that point, you won’t get useful information out of that retrospective. Folks need to feel safe to disclose mistakes they have made. That's how we find out how to fix these gaps in our tools.
  42. One way you can tell a retrospective was good is in the end you have a ridiculous list of remediation items.
  43. Remediations can be big and sweeping, to tiny and tactical, to completely absurd. The ridiculous means you made it to the end of the questioning line!
  44. Don’t feel you have to do every remediation that comes out of a retrospective. Give yourself the freedom to think about all the options and narrow them down afterwards. Narrow down what you can commit to only after you’ve been creative. Don’t discount big projects either! That’s the really interesting work. This is where it helps to understand your company’s process for bringing new work into engineering.
  45. All too often we focus on remediations we can do quickly and within one team. We should be thinking more holistically.
  46. Product is often really excited to hear new ideas. It’s their job to think about how to improve customer experience and what new things customers want. SREs are great at finding problems and Product is great at finding solutions.
  47. An example of something that came out of a common need for our engineers and our customers -- Heroku Pipelines. We use this for our own internal deployment flows! A lot of Heroku runs on Heroku. Apps in a pipeline are grouped into “review”, “development”, “staging”, and “production” stages representing different deployment steps in a continuous delivery workflow.
  48. You don’t have to build something huge to be customer facing. A lot of time SREs think of ourselves as internet plumbers (or janitors) -- no one knows we’re there until something’s broken. That’s valuable! It’s also gratifying to see your work in front of a customer.
  49. Don’t limit yourself to behind the scenes work. Don’t settle for tools that are unpleasant to use. Don’t prevent yourself from bringing up ideas because it will require cross-team or cross-functional collaboration. You can improve your customers’ experience and your own.
  50. Back to our war story. What did we actually do to fix our INT rollover problems?
  51. Well, we added tooling to easily detect rollover conditions and give you a heads up to fix them before your database comes to a halt. There’s a heroku postgres tool called pg:diagnose, and it will now alert you when 75% and then 90% of your integer sequence is consumed.
  52. We also added process. There’s a productionization checklist that services should be going through before they hit production. We added an item to ensure sequences are in BIGINT. There’s no reason for us to use integer rather than bigint columns for sequences in Heroku Postgres. https://www.flickr.com/photos/peretzpup/2361847171/
  53. And of course we could and will improve. We’d like to have this check scan our production databases automatically and alert before failure. Then of course, we could give that option to our customers. https://www.flickr.com/photos/nnova/2967902322/
  54. We also are sending pull requests to at least one common open source framework (yes, still looking at you, ActiveRecord) to set better defaults.
  55. Thanks for sticking with me while I explain why I love failure. We’re all going to fail at some point, and operating distributed systems means your odds get much higher. It’s way easier to fail when you remember that every failure is a chance to learn. Make them count!
  56. Some relevant links! I hope these help you. Thank you for your time today.