SlideShare a Scribd company logo
Building Disaster Recovery via
Resilience Engineering
Michael Kehoe
Staff SRE - LinkedIn
Tonight’s
agenda
1 Introductions
2 What is Resilience Engineering
3 The Problem Statement
4 Project Overview
5 Testing Process
6 Project Outcomes
7 Key Takeaways
8 Q&A
Introduction
Michael Kehoe
/USR/BIN/WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Former Network Engineer at the
University of Queensland
Who are we?
PRODUCTION-SRE TEAM AT LINKEDIN
• Disaster Recovery Planning and
Automation
• Incident Response and Automation
• Visibility Engineering
• Reliability Principles
LinkedIn
EVOLUTION OF THE INFRASTRUCTURE
2003 2010 2011 2013 2014 2015
Active &
Passive
Active &
Active
Multi-colo 3-
way Active &
Active
Multi-colo n-
way Active &
Active
LinkedIn
2018
4 Data Centers 21 PoPs 1000+ services
What is Resilience
Engineering?
What is Resilience Engineering?
• Projects that directly demand increased
resilience from our applications and
infrastructure.
• Application Injection Failure
• Infrastructure Injection Failure
• Full Disaster-Recovery Tests
Problem Statement
How often have you heard stories where someone
thought they had a disaster strategy, never tested it and
it fails when you need it the most?
Problem Statement
• How do we ensure that we always have
disaster recovery ability without incident?
• How do we consistently test for disaster
recovery ability without disrupting the
company?
Project Overview
Project Overview
1
• Build a process (with Automation) to facilitate disaster recovery
• Operate the process on regular cadence
• Provide reporting on outcomes of tests with engineering executives
Testing Process
What is Load Testing?
5x a week Peak hour traffic Fixed SLA
LinkedIn Traffic-Tier
Border
Router IPVS ATS ATS Frontend
EDGE FABRIC
Stickyrouting
LinkedIn Traffic-Tier
Fabric
Buckets
1
91
2 3 10
92 93 100
LinkedIn Traffic-Tier
EDGE FABRIC
DC1
DC2
DC1 in Cookie
Got DC2 as secondary fabric
Gets
secondary
fabric for userStickyrouting
TrafficShift Architecture
Web
application
Salt master
Stickyrouting
ServiceCouchbase Backend Worker
Processes
FABRIC
BUCKETS
Load Testing
FABRIC
DC3
DC1 DC2
60%
Traffic
Percentage
Load Testing
22
Project Outcomes
Benefits of Load-testing
Capacity
Planning
Identify Bugs Confidence
Benefits of Load-testing
CAPACITY PLANNING
• Through this process, we continuously validate our infrastructure
capacity
• This is the best signal we can possibly get since we’re simulating a
real disaster
Benefits of Load-testing
IDENTIFY BUGS
2
• Some bugs are only found at high load (under duress)
• Helps find inefficiency’s that otherwise may not be found until it’s too late
• Gives us clues on how to make our code more resilient to potential failure
Benefits of Load-testing
CONFIDENCE
2
• Through load-testing, we’ve built confidence in our disaster recovery
strategy
• We understand exactly:
• What process to follow
• How long it takes to avert disaster
• What are the risks associated with a disaster incident
Key Takeaways
Key Takeaways
• Resilience Engineering is a must for
LinkedIn
• Design infrastructure to facilitate disaster
recovery
• Disaster-test regularly to avoid surprises
• Automate your testing/ process to reduce
engagement time
Q&A
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

More Related Content

What's hot

DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueDevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
RapidValue
 
How to plug the data gap in DevOps
How to plug the data gap in DevOpsHow to plug the data gap in DevOps
How to plug the data gap in DevOps
Deborah Schalm
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real World
Agile Testing Alliance
 
DevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph OursDevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph Ours
QA or the Highway
 
Continuous Testing in DevOps
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOps
TechWell
 
Scaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBees
DevOps.com
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
DevOpsDays Tel Aviv
 
Building DevOps Toolchain
Building DevOps ToolchainBuilding DevOps Toolchain
Building DevOps Toolchain
IBM UrbanCode Products
 
DevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen BealDevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen Beal
Sonatype
 
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Hannes Lenke
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
Derek Chen
 
Drive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous TestingDrive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous Testing
CA Technologies
 
A True Story of Why QA Loves DevOps
A True Story of Why QA Loves DevOpsA True Story of Why QA Loves DevOps
A True Story of Why QA Loves DevOps
IBM UrbanCode Products
 
Kku2011
Kku2011Kku2011
Secure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart waySecure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart way
Eficode
 
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
Software Guru
 
Engineering Trust in Your Automated Tests
Engineering Trust in Your Automated TestsEngineering Trust in Your Automated Tests
Engineering Trust in Your Automated Tests
Jyoti Mittal
 
Where Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOpsWhere Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOps
QASymphony
 
Designing for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real WorldDesigning for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real World
Qualitest
 
Why Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and ObservabilityWhy Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and Observability
Eficode
 

What's hot (20)

DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueDevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
 
How to plug the data gap in DevOps
How to plug the data gap in DevOpsHow to plug the data gap in DevOps
How to plug the data gap in DevOps
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real World
 
DevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph OursDevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph Ours
 
Continuous Testing in DevOps
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOps
 
Scaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBees
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
 
Building DevOps Toolchain
Building DevOps ToolchainBuilding DevOps Toolchain
Building DevOps Toolchain
 
DevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen BealDevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen Beal
 
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
 
Drive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous TestingDrive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous Testing
 
A True Story of Why QA Loves DevOps
A True Story of Why QA Loves DevOpsA True Story of Why QA Loves DevOps
A True Story of Why QA Loves DevOps
 
Kku2011
Kku2011Kku2011
Kku2011
 
Secure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart waySecure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart way
 
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
 
Engineering Trust in Your Automated Tests
Engineering Trust in Your Automated TestsEngineering Trust in Your Automated Tests
Engineering Trust in Your Automated Tests
 
Where Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOpsWhere Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOps
 
Designing for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real WorldDesigning for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real World
 
Why Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and ObservabilityWhy Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and Observability
 

Similar to SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
Michael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Michael Kehoe
 
Microdeployments for microservices dev ops nashville
Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashville
Nathaniel (Ned) Bauerle
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Mike Villiger
 
Forget about Agile
Forget about AgileForget about Agile
Forget about Agile
Software Guru
 
DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)
Qualitest
 
Quality 4.0 and reimagining quality
Quality 4.0 and reimagining qualityQuality 4.0 and reimagining quality
Quality 4.0 and reimagining quality
Dr. Anish Cheriyan (PhD)
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Agile India
 
Software Lifecycle
Software LifecycleSoftware Lifecycle
Software Lifecycle
Soumen Sarkar
 
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - ConfluxContinuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
Matthew Skelton
 
Building Next Gen Applications and Microservices
Building Next Gen Applications and Microservices Building Next Gen Applications and Microservices
Building Next Gen Applications and Microservices
Paula Peña (She, Her, Hers)
 
Getting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
 
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Simon Storm
 
DevOps in Practice
DevOps in PracticeDevOps in Practice
DevOps in Practice
Derek Chen
 
implanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environmentsimplanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environments
QualiQuali
 
Implementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic EnvironmentsImplementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic Environments
Sauce Labs
 
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
Anders Lundsgård
 
Operating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud Microservices
Noriaki Tatsumi
 
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity Software Ireland
 
How to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery PipelineHow to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery Pipeline
Dynatrace
 

Similar to SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering (20)

The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Microdeployments for microservices dev ops nashville
Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashville
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
 
Forget about Agile
Forget about AgileForget about Agile
Forget about Agile
 
DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)
 
Quality 4.0 and reimagining quality
Quality 4.0 and reimagining qualityQuality 4.0 and reimagining quality
Quality 4.0 and reimagining quality
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
 
Software Lifecycle
Software LifecycleSoftware Lifecycle
Software Lifecycle
 
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - ConfluxContinuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
 
Building Next Gen Applications and Microservices
Building Next Gen Applications and Microservices Building Next Gen Applications and Microservices
Building Next Gen Applications and Microservices
 
Getting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of Concepts
 
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
 
DevOps in Practice
DevOps in PracticeDevOps in Practice
DevOps in Practice
 
implanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environmentsimplanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environments
 
Implementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic EnvironmentsImplementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic Environments
 
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
 
Operating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud Microservices
 
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
 
How to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery PipelineHow to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery Pipeline
 

More from Michael Kehoe

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
Michael Kehoe
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
Michael Kehoe
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
Michael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
Michael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
Michael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
Michael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
Michael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Michael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
Michael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
Michael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
Michael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
Michael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Michael Kehoe
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
Michael Kehoe
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Michael Kehoe
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
Michael Kehoe
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
Michael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
Michael Kehoe
 

More from Michael Kehoe (20)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 

Recently uploaded

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 

Recently uploaded (20)

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 

SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

Editor's Notes

  1. Anil TrafficShift is a two part application - A web application provides easy way for engineers to create planned and emergency offline plans. We leverage couchbase as our key/value persistence store Python backend worker processes talks to Salt Master via Salt API And instructs stickyrouting service to turn buckets online and offline We leverage this toolset to run load tests or stress tests of our datacenters Uff that’s a lot of talk, how to mitigate issues by doing trafficshift. But if you keenly observe, we are migrating live traffic across datacenter, why not leverage the same to stress test datacenter ? How awesome is that ? Not stress test single service, stress the whole system. I am gonna talk about load testing next.
  2. Anil As you can see by turning precise number of buckets offline in US-West and US-East - we can reroute that extra traffic to Target datacenter We do this in a pretty controlled manner in steps until the threshold level of 50% is reached. If for any reason, an alert fires during this stress test, our TrafficShift tool acknowledges that automatically rebalances the site traffic, sends out the stress test report to SREs