SlideShare a Scribd company logo
How It All Goes Down 
Your next service outage will be during 
@ctosummit at 1:50pm. 
Daniel Doubrovkine 
db@artsy.net 
@dblockdotorg
Do you have a Syrian domain? 
somewhere here there’s your .sy TLD authority
Are you using MongoDB?
Do you use a PAAS?
Is there any logic involved? 
1. Restart Unicorns, no improvement. 
2. Restart a server, no improvement. 
3. Found an EC2 maintenance note buried in email. 
4. Patch responsible for 5x slower response times! 
5. Do a DB query from a production console. 
6. Identify a pattern (slow, then fast). 
7. Look in MongoDB logs.
Are there humans involved?
Are you using a beta version of a driver?
Post-Mortem
Subject
Summary 
We have experienced 3 separate 
outages over the last 72 hours that 
have affected api.artsy.net which is 
the backbone of all our 
applications, including Admin, 
CMS, artsy.net, m.artsy.net, etc. 
The first incident was slower 
response time 9/28 2AM-10AM 
EST. 
The second and third incidents 
were two outages, 9/29 11PM- 
12AM EST and 9/30 4AM-6:30AM 
EST.
Cause 
1. Trigger 
Servers rebooted. 
2. Unexpected behavior. 
Reboots should have been handled just fine by software. 
3. Cause. 
Human error + bug in the driver. 
4. Consequence. 
All front-ends down.
Resolution 
1. Ticket 
Woke up a contractor. 
2. Manual intervention. 
Kick servers.
Post-Mortem 
1. Human Error Preventable 
The dismissal of the alert was wrong. 
2. Failure to Plan. 
Reboot schedule needed a human to monitor it.
Outage History
Thanks! 
Daniel Doubrovkine 
db@artsy.net 
@dblockdotorg

More Related Content

Similar to How it All Goes Down

Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Jeremy Zawodny
 
Bp106 Worst Practices Final
Bp106   Worst Practices FinalBp106   Worst Practices Final
Bp106 Worst Practices Final
Bill Buchan
 
IBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-Bending
IBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-BendingIBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-Bending
IBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-Bending
Luis Guirigay
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013Server Density
 
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Jean-Luc David
 
Team 4 Presents: The Client Server Model
Team 4 Presents: The Client Server Model Team 4 Presents: The Client Server Model
Team 4 Presents: The Client Server Model
anniekate93
 
How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)
Maarten Balliauw
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagramMohit Jain
 
Sirt roundtable malicious-emailtrendmicro
Sirt roundtable malicious-emailtrendmicroSirt roundtable malicious-emailtrendmicro
Sirt roundtable malicious-emailtrendmicro
Sumit Tambe
 
Real Life Java EE Performance Tuning
Real Life Java EE Performance TuningReal Life Java EE Performance Tuning
Real Life Java EE Performance Tuning
C2B2 Consulting
 
Real Life Java EE Performance Tuning
Real Life Java EE Performance TuningReal Life Java EE Performance Tuning
Real Life Java EE Performance Tuningmbrasier
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Brian Brazil
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012
Amazon Web Services
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
Srinath Perera
 
Wcl303 russinovich
Wcl303 russinovichWcl303 russinovich
Wcl303 russinovichconleyc
 
Linux server world
Linux server worldLinux server world
Linux server worldAkshat Singh
 
Cassandra at Zalando
Cassandra at ZalandoCassandra at Zalando
Cassandra at Zalando
Luis Mineiro
 
Micro services
Micro servicesMicro services
Micro services
Alex Punnen
 
IBM Connect 2014 presentation. BP106 Managed Mail File Replicas.– How to Win...
IBM Connect 2014 presentation.  BP106 Managed Mail File Replicas.– How to Win...IBM Connect 2014 presentation.  BP106 Managed Mail File Replicas.– How to Win...
IBM Connect 2014 presentation. BP106 Managed Mail File Replicas.– How to Win...
Don Martindale
 

Similar to How it All Goes Down (20)

Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Bp106 Worst Practices Final
Bp106   Worst Practices FinalBp106   Worst Practices Final
Bp106 Worst Practices Final
 
IBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-Bending
IBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-BendingIBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-Bending
IBM ConnectED 2015 - BP103: Solving the Weird, the Obscure, and the Mind-Bending
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
 
Team 4 Presents: The Client Server Model
Team 4 Presents: The Client Server Model Team 4 Presents: The Client Server Model
Team 4 Presents: The Client Server Model
 
How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
 
Sirt roundtable malicious-emailtrendmicro
Sirt roundtable malicious-emailtrendmicroSirt roundtable malicious-emailtrendmicro
Sirt roundtable malicious-emailtrendmicro
 
Real Life Java EE Performance Tuning
Real Life Java EE Performance TuningReal Life Java EE Performance Tuning
Real Life Java EE Performance Tuning
 
Real Life Java EE Performance Tuning
Real Life Java EE Performance TuningReal Life Java EE Performance Tuning
Real Life Java EE Performance Tuning
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 
Wcl303 russinovich
Wcl303 russinovichWcl303 russinovich
Wcl303 russinovich
 
Linux server world
Linux server worldLinux server world
Linux server world
 
Cassandra at Zalando
Cassandra at ZalandoCassandra at Zalando
Cassandra at Zalando
 
Micro services
Micro servicesMicro services
Micro services
 
Os Whitaker
Os WhitakerOs Whitaker
Os Whitaker
 
IBM Connect 2014 presentation. BP106 Managed Mail File Replicas.– How to Win...
IBM Connect 2014 presentation.  BP106 Managed Mail File Replicas.– How to Win...IBM Connect 2014 presentation.  BP106 Managed Mail File Replicas.– How to Win...
IBM Connect 2014 presentation. BP106 Managed Mail File Replicas.– How to Win...
 

More from Daniel Doubrovkine

The Future of Art @ Worlds Fair Nano
The Future of Art @ Worlds Fair NanoThe Future of Art @ Worlds Fair Nano
The Future of Art @ Worlds Fair Nano
Daniel Doubrovkine
 
Nasdaq CTO Summit: Inspiring Team Leads to Give Away Legos
Nasdaq CTO Summit: Inspiring Team Leads to Give Away LegosNasdaq CTO Summit: Inspiring Team Leads to Give Away Legos
Nasdaq CTO Summit: Inspiring Team Leads to Give Away Legos
Daniel Doubrovkine
 
Product Development 101
Product Development 101Product Development 101
Product Development 101
Daniel Doubrovkine
 
Open-Source by Default, UN Community.camp
Open-Source by Default, UN Community.campOpen-Source by Default, UN Community.camp
Open-Source by Default, UN Community.camp
Daniel Doubrovkine
 
Your First Slack Ruby Bot
Your First Slack Ruby BotYour First Slack Ruby Bot
Your First Slack Ruby Bot
Daniel Doubrovkine
 
Single Sign-On with Waffle
Single Sign-On with WaffleSingle Sign-On with Waffle
Single Sign-On with Waffle
Daniel Doubrovkine
 
Taking Over Open Source Projects @ GoGaRuCo 2014
Taking Over Open Source Projects @ GoGaRuCo 2014Taking Over Open Source Projects @ GoGaRuCo 2014
Taking Over Open Source Projects @ GoGaRuCo 2014
Daniel Doubrovkine
 
Mentoring Engineers & Humans
Mentoring Engineers & HumansMentoring Engineers & Humans
Mentoring Engineers & Humans
Daniel Doubrovkine
 
Tiling and Zooming ASCII Art @ iOSoho
Tiling and Zooming ASCII Art @ iOSohoTiling and Zooming ASCII Art @ iOSoho
Tiling and Zooming ASCII Art @ iOSoho
Daniel Doubrovkine
 
Artsy ♥ ASCII ART
Artsy ♥ ASCII ARTArtsy ♥ ASCII ART
Artsy ♥ ASCII ART
Daniel Doubrovkine
 
The Other Side of Your Interview
The Other Side of Your InterviewThe Other Side of Your Interview
The Other Side of Your Interview
Daniel Doubrovkine
 
Hiring Engineers (the Artsy Way)
Hiring Engineers (the Artsy Way)Hiring Engineers (the Artsy Way)
Hiring Engineers (the Artsy Way)
Daniel Doubrovkine
 
Mentoring 101 - the Artsy way
Mentoring 101 - the Artsy wayMentoring 101 - the Artsy way
Mentoring 101 - the Artsy way
Daniel Doubrovkine
 
Building and Scaling a Test Driven Culture
Building and Scaling a Test Driven CultureBuilding and Scaling a Test Driven Culture
Building and Scaling a Test Driven Culture
Daniel Doubrovkine
 
Introducing Remote Install Framework
Introducing Remote Install FrameworkIntroducing Remote Install Framework
Introducing Remote Install Framework
Daniel Doubrovkine
 
Taming the Testing Beast - AgileDC 2012
Taming the Testing Beast - AgileDC 2012Taming the Testing Beast - AgileDC 2012
Taming the Testing Beast - AgileDC 2012
Daniel Doubrovkine
 
GeneralAssemb.ly Summer Program: Tech from the Ground Up
GeneralAssemb.ly Summer Program: Tech from the Ground UpGeneralAssemb.ly Summer Program: Tech from the Ground Up
GeneralAssemb.ly Summer Program: Tech from the Ground UpDaniel Doubrovkine
 
Making Agile Choices in Software Technology
Making Agile Choices in Software TechnologyMaking Agile Choices in Software Technology
Making Agile Choices in Software Technology
Daniel Doubrovkine
 
From Zero to Mongo, Art.sy Experience w/ MongoDB
From Zero to Mongo, Art.sy Experience w/ MongoDBFrom Zero to Mongo, Art.sy Experience w/ MongoDB
From Zero to Mongo, Art.sy Experience w/ MongoDB
Daniel Doubrovkine
 

More from Daniel Doubrovkine (20)

The Future of Art @ Worlds Fair Nano
The Future of Art @ Worlds Fair NanoThe Future of Art @ Worlds Fair Nano
The Future of Art @ Worlds Fair Nano
 
Nasdaq CTO Summit: Inspiring Team Leads to Give Away Legos
Nasdaq CTO Summit: Inspiring Team Leads to Give Away LegosNasdaq CTO Summit: Inspiring Team Leads to Give Away Legos
Nasdaq CTO Summit: Inspiring Team Leads to Give Away Legos
 
Product Development 101
Product Development 101Product Development 101
Product Development 101
 
Open-Source by Default, UN Community.camp
Open-Source by Default, UN Community.campOpen-Source by Default, UN Community.camp
Open-Source by Default, UN Community.camp
 
Your First Slack Ruby Bot
Your First Slack Ruby BotYour First Slack Ruby Bot
Your First Slack Ruby Bot
 
Single Sign-On with Waffle
Single Sign-On with WaffleSingle Sign-On with Waffle
Single Sign-On with Waffle
 
Taking Over Open Source Projects @ GoGaRuCo 2014
Taking Over Open Source Projects @ GoGaRuCo 2014Taking Over Open Source Projects @ GoGaRuCo 2014
Taking Over Open Source Projects @ GoGaRuCo 2014
 
Mentoring Engineers & Humans
Mentoring Engineers & HumansMentoring Engineers & Humans
Mentoring Engineers & Humans
 
Tiling and Zooming ASCII Art @ iOSoho
Tiling and Zooming ASCII Art @ iOSohoTiling and Zooming ASCII Art @ iOSoho
Tiling and Zooming ASCII Art @ iOSoho
 
Artsy ♥ ASCII ART
Artsy ♥ ASCII ARTArtsy ♥ ASCII ART
Artsy ♥ ASCII ART
 
The Other Side of Your Interview
The Other Side of Your InterviewThe Other Side of Your Interview
The Other Side of Your Interview
 
Hiring Engineers (the Artsy Way)
Hiring Engineers (the Artsy Way)Hiring Engineers (the Artsy Way)
Hiring Engineers (the Artsy Way)
 
Mentoring 101 - the Artsy way
Mentoring 101 - the Artsy wayMentoring 101 - the Artsy way
Mentoring 101 - the Artsy way
 
Building and Scaling a Test Driven Culture
Building and Scaling a Test Driven CultureBuilding and Scaling a Test Driven Culture
Building and Scaling a Test Driven Culture
 
Introducing Remote Install Framework
Introducing Remote Install FrameworkIntroducing Remote Install Framework
Introducing Remote Install Framework
 
HackYale 0-60 in Startup Tech
HackYale 0-60 in Startup TechHackYale 0-60 in Startup Tech
HackYale 0-60 in Startup Tech
 
Taming the Testing Beast - AgileDC 2012
Taming the Testing Beast - AgileDC 2012Taming the Testing Beast - AgileDC 2012
Taming the Testing Beast - AgileDC 2012
 
GeneralAssemb.ly Summer Program: Tech from the Ground Up
GeneralAssemb.ly Summer Program: Tech from the Ground UpGeneralAssemb.ly Summer Program: Tech from the Ground Up
GeneralAssemb.ly Summer Program: Tech from the Ground Up
 
Making Agile Choices in Software Technology
Making Agile Choices in Software TechnologyMaking Agile Choices in Software Technology
Making Agile Choices in Software Technology
 
From Zero to Mongo, Art.sy Experience w/ MongoDB
From Zero to Mongo, Art.sy Experience w/ MongoDBFrom Zero to Mongo, Art.sy Experience w/ MongoDB
From Zero to Mongo, Art.sy Experience w/ MongoDB
 

Recently uploaded

Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

How it All Goes Down

  • 1. How It All Goes Down Your next service outage will be during @ctosummit at 1:50pm. Daniel Doubrovkine db@artsy.net @dblockdotorg
  • 2.
  • 3. Do you have a Syrian domain? somewhere here there’s your .sy TLD authority
  • 4. Are you using MongoDB?
  • 5. Do you use a PAAS?
  • 6. Is there any logic involved? 1. Restart Unicorns, no improvement. 2. Restart a server, no improvement. 3. Found an EC2 maintenance note buried in email. 4. Patch responsible for 5x slower response times! 5. Do a DB query from a production console. 6. Identify a pattern (slow, then fast). 7. Look in MongoDB logs.
  • 7. Are there humans involved?
  • 8. Are you using a beta version of a driver?
  • 11. Summary We have experienced 3 separate outages over the last 72 hours that have affected api.artsy.net which is the backbone of all our applications, including Admin, CMS, artsy.net, m.artsy.net, etc. The first incident was slower response time 9/28 2AM-10AM EST. The second and third incidents were two outages, 9/29 11PM- 12AM EST and 9/30 4AM-6:30AM EST.
  • 12. Cause 1. Trigger Servers rebooted. 2. Unexpected behavior. Reboots should have been handled just fine by software. 3. Cause. Human error + bug in the driver. 4. Consequence. All front-ends down.
  • 13. Resolution 1. Ticket Woke up a contractor. 2. Manual intervention. Kick servers.
  • 14. Post-Mortem 1. Human Error Preventable The dismissal of the alert was wrong. 2. Failure to Plan. Reboot schedule needed a human to monitor it.
  • 16. Thanks! Daniel Doubrovkine db@artsy.net @dblockdotorg