SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

Michael Kehoe
Michael KehoeArchitect of reliable, scalable infrastructure at LinkedIn
Building Disaster Recovery via
Resilience Engineering
Michael Kehoe
Staff SRE - LinkedIn
Tonight’s
agenda
1 Introductions
2 What is Resilience Engineering
3 The Problem Statement
4 Project Overview
5 Testing Process
6 Project Outcomes
7 Key Takeaways
8 Q&A
Introduction
Michael Kehoe
/USR/BIN/WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Former Network Engineer at the
University of Queensland
Who are we?
PRODUCTION-SRE TEAM AT LINKEDIN
• Disaster Recovery Planning and
Automation
• Incident Response and Automation
• Visibility Engineering
• Reliability Principles
LinkedIn
EVOLUTION OF THE INFRASTRUCTURE
2003 2010 2011 2013 2014 2015
Active &
Passive
Active &
Active
Multi-colo 3-
way Active &
Active
Multi-colo n-
way Active &
Active
LinkedIn
2018
4 Data Centers 21 PoPs 1000+ services
What is Resilience
Engineering?
What is Resilience Engineering?
• Projects that directly demand increased
resilience from our applications and
infrastructure.
• Application Injection Failure
• Infrastructure Injection Failure
• Full Disaster-Recovery Tests
Problem Statement
How often have you heard stories where someone
thought they had a disaster strategy, never tested it and
it fails when you need it the most?
Problem Statement
• How do we ensure that we always have
disaster recovery ability without incident?
• How do we consistently test for disaster
recovery ability without disrupting the
company?
Project Overview
Project Overview
1
• Build a process (with Automation) to facilitate disaster recovery
• Operate the process on regular cadence
• Provide reporting on outcomes of tests with engineering executives
Testing Process
What is Load Testing?
5x a week Peak hour traffic Fixed SLA
LinkedIn Traffic-Tier
Border
Router IPVS ATS ATS Frontend
EDGE FABRIC
Stickyrouting
LinkedIn Traffic-Tier
Fabric
Buckets
1
91
2 3 10
92 93 100
LinkedIn Traffic-Tier
EDGE FABRIC
DC1
DC2
DC1 in Cookie
Got DC2 as secondary fabric
Gets
secondary
fabric for userStickyrouting
TrafficShift Architecture
Web
application
Salt master
Stickyrouting
ServiceCouchbase Backend Worker
Processes
FABRIC
BUCKETS
Load Testing
FABRIC
DC3
DC1 DC2
60%
Traffic
Percentage
Load Testing
22
Project Outcomes
Benefits of Load-testing
Capacity
Planning
Identify Bugs Confidence
Benefits of Load-testing
CAPACITY PLANNING
• Through this process, we continuously validate our infrastructure
capacity
• This is the best signal we can possibly get since we’re simulating a
real disaster
Benefits of Load-testing
IDENTIFY BUGS
2
• Some bugs are only found at high load (under duress)
• Helps find inefficiency’s that otherwise may not be found until it’s too late
• Gives us clues on how to make our code more resilient to potential failure
Benefits of Load-testing
CONFIDENCE
2
• Through load-testing, we’ve built confidence in our disaster recovery
strategy
• We understand exactly:
• What process to follow
• How long it takes to avert disaster
• What are the risks associated with a disaster incident
Key Takeaways
Key Takeaways
• Resilience Engineering is a must for
LinkedIn
• Design infrastructure to facilitate disaster
recovery
• Disaster-test regularly to avoid surprises
• Automate your testing/ process to reduce
engagement time
Q&A
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering
1 of 31

Recommended

The Human Side of DevSecOps by
The Human Side of DevSecOpsThe Human Side of DevSecOps
The Human Side of DevSecOpsJules Pierre-Louis
693 views22 slides
Continuous Delivery by
Continuous DeliveryContinuous Delivery
Continuous DeliveryMike McGarr
15.1K views63 slides
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are Secure by
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are SecureSecurity & DevOps- Ways To Make Sure Your Apps & Infrastructure Are Secure
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are SecurePuppet
876 views29 slides
DevSecOps and the CI/CD Pipeline by
 DevSecOps and the CI/CD Pipeline DevSecOps and the CI/CD Pipeline
DevSecOps and the CI/CD PipelineJames Wickett
4.4K views98 slides
NYIT DSC/ Spring 2021 - Introduction to DevOps (CI/CD) by
NYIT DSC/ Spring 2021 - Introduction to DevOps (CI/CD)NYIT DSC/ Spring 2021 - Introduction to DevOps (CI/CD)
NYIT DSC/ Spring 2021 - Introduction to DevOps (CI/CD)Hui (Henry) Chen
43 views41 slides
DevSecCon London 2017: How far left do you want to go with security? by Javie... by
DevSecCon London 2017: How far left do you want to go with security? by Javie...DevSecCon London 2017: How far left do you want to go with security? by Javie...
DevSecCon London 2017: How far left do you want to go with security? by Javie...DevSecCon
326 views15 slides

More Related Content

What's hot

DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue by
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueDevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueRapidValue
706 views18 slides
How to plug the data gap in DevOps by
How to plug the data gap in DevOpsHow to plug the data gap in DevOps
How to plug the data gap in DevOpsDeborah Schalm
439 views34 slides
ATAGTR2017 Security Testing / IoT Testing in Real World by
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldAgile Testing Alliance
1.3K views43 slides
DevOps the Big Picture for Testers by Joseph Ours by
DevOps the Big Picture for Testers by Joseph OursDevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph OursQA or the Highway
346 views53 slides
Continuous Testing in DevOps by
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOpsTechWell
2.1K views39 slides
Scaling Enterprise DevOps with CloudBees by
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesDevOps.com
56 views39 slides

What's hot(20)

DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueDevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
RapidValue706 views
How to plug the data gap in DevOps by Deborah Schalm
How to plug the data gap in DevOpsHow to plug the data gap in DevOps
How to plug the data gap in DevOps
Deborah Schalm439 views
DevOps the Big Picture for Testers by Joseph Ours by QA or the Highway
DevOps the Big Picture for Testers by Joseph OursDevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph Ours
QA or the Highway346 views
Continuous Testing in DevOps by TechWell
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOps
TechWell2.1K views
Scaling Enterprise DevOps with CloudBees by DevOps.com
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBees
DevOps.com56 views
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB by DevOpsDays Tel Aviv
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
DevOps and All the Continuouses w/ Helen Beal by Sonatype
DevOps and All the Continuouses w/ Helen BealDevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen Beal
Sonatype 596 views
Reliability (R)evolution: Turning the DevOps World Upside Down (Again). by Hannes Lenke
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Hannes Lenke133 views
The Art of Container Monitoring by Derek Chen
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
Derek Chen140 views
Drive Continuous Delivery With Continuous Testing by CA Technologies
Drive Continuous Delivery With Continuous TestingDrive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous Testing
CA Technologies1.6K views
Secure your Azure and DevOps in a smart way by Eficode
Secure your Azure and DevOps in a smart waySecure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart way
Eficode1.2K views
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io by Software Guru
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
Software Guru787 views
Engineering Trust in Your Automated Tests by Jyoti Mittal
Engineering Trust in Your Automated TestsEngineering Trust in Your Automated Tests
Engineering Trust in Your Automated Tests
Jyoti Mittal295 views
Where Testers & QA Fit in the Story of DevOps by QASymphony
Where Testers & QA Fit in the Story of DevOpsWhere Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOps
QASymphony 5.2K views
Designing for the internet - Page Objects for the Real World by Qualitest
Designing for the internet - Page Objects for the Real WorldDesigning for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real World
Qualitest676 views
Why Serverless is scary without DevSecOps and Observability by Eficode
Why Serverless is scary without DevSecOps and ObservabilityWhy Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and Observability
Eficode641 views

Similar to SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

The Next Wave of Reliability Engineering by
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
687 views36 slides
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale by
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
247 views35 slides
Microdeployments for microservices dev ops nashville by
Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashvilleNathaniel (Ned) Bauerle
213 views16 slides
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ... by
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Mike Villiger
1.4K views45 slides
Forget about Agile by
Forget about AgileForget about Agile
Forget about AgileSoftware Guru
889 views55 slides
DevSecOps - It can change your life (cycle) by
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)Qualitest
834 views31 slides

Similar to SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering(20)

The Next Wave of Reliability Engineering by Michael Kehoe
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
Michael Kehoe687 views
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale by Michael Kehoe
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Michael Kehoe247 views
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ... by Mike Villiger
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Mike Villiger1.4K views
DevSecOps - It can change your life (cycle) by Qualitest
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)
Qualitest834 views
Building and Scaling High Performing Technology Organizations by Jez Humble a... by Agile India
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Agile India547 views
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux by Matthew Skelton
Continuous Delivery for people who do not write code - Matthew Skelton - ConfluxContinuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
Matthew Skelton1.2K views
Getting Started with ThousandEyes Proof of Concepts by ThousandEyes
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes66 views
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M... by Simon Storm
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Simon Storm1.5K views
DevOps in Practice by Derek Chen
DevOps in PracticeDevOps in Practice
DevOps in Practice
Derek Chen471 views
implanting DevOps at scale using dynamic test environments by QualiQuali
implanting DevOps at scale using dynamic test environmentsimplanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environments
QualiQuali61 views
Implementing DevOps at Scale Using Dynamic Environments by Sauce Labs
Implementing DevOps at Scale Using Dynamic EnvironmentsImplementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic Environments
Sauce Labs822 views
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017 by Anders Lundsgård
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
Anders Lundsgård505 views
Operating a High Velocity Large Organization with Spring Cloud Microservices by Noriaki Tatsumi
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud Microservices
Noriaki Tatsumi547 views
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r... by Curiosity Software Ireland
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
How to Build a Metrics-optimized Software Delivery Pipeline by Dynatrace
How to Build a Metrics-optimized Software Delivery PipelineHow to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery Pipeline
Dynatrace1.5K views

More from Michael Kehoe

eBPF Workshop by
eBPF WorkshopeBPF Workshop
eBPF WorkshopMichael Kehoe
1.4K views26 slides
eBPF Basics by
eBPF BasicseBPF Basics
eBPF BasicsMichael Kehoe
2.7K views63 slides
Code Yellow: Helping operations top-heavy teams the smart way by
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
141 views29 slides
QConSF 2018: Building Production-Ready Applications by
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
193 views43 slides
Helping operations top-heavy teams the smart way by
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
420 views29 slides
AllDayDevops: What the NTSB teaches us about incident management & postmortems by
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
321 views58 slides

More from Michael Kehoe(20)

Code Yellow: Helping operations top-heavy teams the smart way by Michael Kehoe
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
Michael Kehoe141 views
QConSF 2018: Building Production-Ready Applications by Michael Kehoe
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
Michael Kehoe193 views
Helping operations top-heavy teams the smart way by Michael Kehoe
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
Michael Kehoe420 views
AllDayDevops: What the NTSB teaches us about incident management & postmortems by Michael Kehoe
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
Michael Kehoe321 views
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops by Michael Kehoe
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Michael Kehoe285 views
What the NTSB teaches us about incident management & postmortems by Michael Kehoe
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
Michael Kehoe489 views
PyBay 2018: Production-Ready Python Applications by Michael Kehoe
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
Michael Kehoe283 views
Helping operations top-heavy teams the smart way by Michael Kehoe
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
Michael Kehoe233 views
Building Production-Ready Microservices: DevopsExchangeSF by Michael Kehoe
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
Michael Kehoe452 views
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at... by Michael Kehoe
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
Michael Kehoe270 views
SRECon-Europe-2017: Networks for SREs by Michael Kehoe
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
Michael Kehoe383 views
Reducing MTTR and False Escalations: Event Correlation at LinkedIn by Michael Kehoe
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Michael Kehoe956 views
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ... by Michael Kehoe
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
Michael Kehoe534 views
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn by Michael Kehoe
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Michael Kehoe720 views
Using SaltStack to Auto Triage and Remediate Production Systems by Michael Kehoe
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
Michael Kehoe1.8K views
SRECon USA 2016: Growing your Entry Level Talent by Michael Kehoe
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
Michael Kehoe520 views

Recently uploaded

Créativité dans le design mécanique à l’aide de l’optimisation topologique by
Créativité dans le design mécanique à l’aide de l’optimisation topologiqueCréativité dans le design mécanique à l’aide de l’optimisation topologique
Créativité dans le design mécanique à l’aide de l’optimisation topologiqueLIEGE CREATIVE
8 views84 slides
CCNA_questions_2021.pdf by
CCNA_questions_2021.pdfCCNA_questions_2021.pdf
CCNA_questions_2021.pdfVUPHUONGTHAO9
7 views196 slides
Automated Remote sensing GPS satellite system for managing resources and moni... by
Automated Remote sensing GPS satellite system for managing resources and moni...Automated Remote sensing GPS satellite system for managing resources and moni...
Automated Remote sensing GPS satellite system for managing resources and moni...Khalid Abdel Naser Abdel Rahim
5 views1 slide
Basic Design Flow for Field Programmable Gate Arrays by
Basic Design Flow for Field Programmable Gate ArraysBasic Design Flow for Field Programmable Gate Arrays
Basic Design Flow for Field Programmable Gate ArraysUsha Mehta
5 views21 slides
REACTJS.pdf by
REACTJS.pdfREACTJS.pdf
REACTJS.pdfArthyR3
37 views16 slides
GPS Survery Presentation/ Slides by
GPS Survery Presentation/ SlidesGPS Survery Presentation/ Slides
GPS Survery Presentation/ SlidesOmarFarukEmon1
7 views13 slides

Recently uploaded(20)

Créativité dans le design mécanique à l’aide de l’optimisation topologique by LIEGE CREATIVE
Créativité dans le design mécanique à l’aide de l’optimisation topologiqueCréativité dans le design mécanique à l’aide de l’optimisation topologique
Créativité dans le design mécanique à l’aide de l’optimisation topologique
LIEGE CREATIVE8 views
Basic Design Flow for Field Programmable Gate Arrays by Usha Mehta
Basic Design Flow for Field Programmable Gate ArraysBasic Design Flow for Field Programmable Gate Arrays
Basic Design Flow for Field Programmable Gate Arrays
Usha Mehta5 views
REACTJS.pdf by ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR337 views
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth by Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 20 views
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... by IJCNCJournal
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
IJCNCJournal5 views
GDSC Mikroskil Members Onboarding 2023.pdf by gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil68 views
Design_Discover_Develop_Campaign.pptx by ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth655 views
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78188 views
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf by AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure10 views
Unlocking Research Visibility.pdf by KhatirNaima
Unlocking Research Visibility.pdfUnlocking Research Visibility.pdf
Unlocking Research Visibility.pdf
KhatirNaima11 views

SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

Editor's Notes

  1. Anil TrafficShift is a two part application - A web application provides easy way for engineers to create planned and emergency offline plans. We leverage couchbase as our key/value persistence store Python backend worker processes talks to Salt Master via Salt API And instructs stickyrouting service to turn buckets online and offline We leverage this toolset to run load tests or stress tests of our datacenters Uff that’s a lot of talk, how to mitigate issues by doing trafficshift. But if you keenly observe, we are migrating live traffic across datacenter, why not leverage the same to stress test datacenter ? How awesome is that ? Not stress test single service, stress the whole system. I am gonna talk about load testing next.
  2. Anil As you can see by turning precise number of buckets offline in US-West and US-East - we can reroute that extra traffic to Target datacenter We do this in a pretty controlled manner in steps until the threshold level of 50% is reached. If for any reason, an alert fires during this stress test, our TrafficShift tool acknowledges that automatically rebalances the site traffic, sends out the stress test report to SREs