SlideShare a Scribd company logo
Making Software. Better.
Simple solutions to big business problems.
Equal Experts is a network of talented, experienced, software
consultants, specialising in agile delivery.
Embracing
collaborative chaos
Running chaos days on large platforms
Lyndsay Prewer | @equalexperts
Photo by Darius Bashar on Unsplash
What is chaos engineering
and why should we care?
Look at what I built today!
Google Cloud Dataflow In the Smart Home Data Pipeline
Operating on the edge of chaos
http://bit.ly/2ZavoyP
http://bit.ly/2QVeWzA
“Two
normally-benign
misconfigurations,
and a specific
software bug,
combined to initiate
the outage”
Predicting failure
Google Cloud Dataflow In the Smart Home Data Pipeline
● How many component parts does
your system have?
● How are they connected?
● How reliable is each part?
● How reliable are the connections?
● What happens when X fails?
Addressing the risk of unexpected failure
A
B
A
B D
C
Z
E
G H
F
I
● Address risk by deliberate
inducing failure
● Observe, reflect and improve
● Build resilience in (like quality)
● Think about production (and
failure) all the time
Simples Hard
Chaos engineering approaches
Manual
In process
Automated
Unplanned
Manual chaos
● Chaos Days
● AWS Game Days
● Change specific chaos
● Chaos monkey / Simian Army
● AWS spot instances / GCP Preemptible VMs
● Randomised pod killer
Automated chaos
In process chaos
● Part of normal engineering process
● Focus for all roles in the team
● Production thinking / building resilience in
Product
Owner
Dev QA Dev Ops
Focus on: Quality AND Production AND Resilience
Define Build Explore Deploy
Unplanned chaos
● Every day is a school day
● Handle incidents well
● Learn from incidents - post incident
reviews
● AWS podcast: http://bit.ly/31oQfAf
A
B D
C
Z
E
G H
F
I
How does it help?
People
ProcessProduct
Knowledge
Behaviour
Expertise
Managing incidents
Learning from incidents
Engineering approach
Observability
Simplification
Alerting
Runbooks
Resilience
Photo by Darius Bashar on Unsplash
Running a Chaos Day
- when and how?
Our context
Legacy systems
X00 million
internal
requests
(busiest day)
X00 million
log messages
(busiest day)
x850
microservices
XXm Customers
60 Delivery teams
~1000 Microservices
Loren ipsum caveat empor
Loren ipsum caveat empor. Loren ipsum
caveat empor. Loren ipsum caveat empor
Loren ipsum caveat empor.
Loren ipsum caveat empor
Loren ipsum caveat empor. Loren ipsum
caveat empor. Loren ipsum caveat empor
Loren ipsum caveat empor.
Loren ipsum caveat empor
Loren ipsum caveat empor. Loren ipsum
caveat empor. Loren ipsum caveat empor
Loren ipsum caveat empor.
6 Platform teams
(AWS PaaS)
When were we ready for chaos?
2013 2014
Cloud
Docker
Scala
Mongo
ELK
Fast
growth
(teams,
services,
traffic)
When were we ready for chaos?
2013 2014 2015 2016
Cloud
Docker
Scala
Mongo
ELK
Fast
growth
(teams,
services,
traffic)
Multi
active WIP
Multi
active
When were we ready for chaos?
2013 2014 2015 2016 2017 2018
Cloud
Docker
Scala
Mongo
ELK
Fast
growth
(teams,
services,
traffic)
Multi
active WIP
Multi
active
More multi
active
(to AWS)
Self serve
deploys
AWS
Ready
for
Chaos
When are you ready for chaos?
Manual
In process
Automated
Unplanned
Photo by Darius Bashar on Unsplash
Who, where and exactly how?
Agents of chaos
● Virtual, closed team
● Draw from component
teams
● Experts / veterans
● Highest bus factor
Chaos scope - know thyself
● Know your architecture
● Know your steady state
● Know your constraints
○ What’s in your control?
○ What’s not?
○ What needs protecting?
Loren ipsum caveat empor
Loren ipsum caveat empor. Loren ipsum
caveat empor. Loren ipsum caveat empor
Loren ipsum caveat empor.
X00 million
internal
requests
(busiest day)
X00 million
log messages
(busiest day)
Chaos scope - trust the brains-storm
http://bit.ly/2XzR7Q9
Chaos scope - brainstorm, then plan the
detail
Team X Team Y Team Z
Chaos scope - hack in amongst the chaos
Team X Team Y Team Z
Deciding where
● Production or closest to it
● Production (like) load
● Production (like) telemetry
● Decide the blast radius
● Decide comm’s channel(s)
Production
Staging
QA
Development
Photo by Darius Bashar on Unsplash
Execution
Deciding when
● To warn or not
● What else is going on?
● It was just an ordinary day …
● Chaos cut-off
Keep calm and chaos on (agents)
● Co-locate the agents
● Collaborate and coordinate well
● Time-box, cover ground
● (Self) document well
Keep calm and chaos on (everyone else)
● Also (self) document well
● Pretend it’s Production on
● It was just an ordinary day ...
Photo by Darius Bashar on Unsplash
Retrospection
Divide and conquer, then regroup
● Major on engineering
improvements (people, process,
product)
● Minor on chaos day improvements
● Component teams retro’s /
incident reviews first
● Then team-of-teams retro
People
ProcessProduct
Team X
Team Y
Team Z
Team of
teams
What did we learn?
● Manage/limit the pain
● Start small
● Production is a tough step
● Production-like is also hard!
● Have fun!
Photo by Darius Bashar on Unsplash
What next?
What’s your next chaos step?
Manual
In process
Automated
Unplanned
● Where are you at in the journey?
● What’s the next (baby) step?
● Need any help?
Thank You
United Kingdom
+44 203 603 7830
helloUK@equalexperts.com
Equal Experts UK Ltd
30 Brock Street
London NW1 3FG
India
+91 20 6607 7763
helloIndia@equalexperts.com
Equal Experts India Private Ltd
Office No. 4-C
Cerebrum IT Park No. B3
Kumar City, Kalyani Nagar
Pune, 411006
Canada
+1 403 775 4861
helloCanada@equalexperts.com
Equal Experts Devices Inc
205 - 279 Midpark way S.E.
T2X 1M2
Calgary, Alberta
Portugal
+351 211 378 414
helloPortugal@equalexperts.com
Equal Experts Portugal
Avenida Dom João II, Nº35
Edificio Infante 11ºA
1990-083 Parque das Nações
Lisboa – Portugal
Thank You
USA
+1 866-943-9737
helloUSA@equalexperts.com
Equal Experts Inc
1460 Broadway
New York
NY 10036
 
LinkedIn
linkedin.com/company/equal-experts
Twitter
@EqualExperts
Web
www.equalexperts.com

More Related Content

What's hot

DevOps Requires Agility
DevOps Requires AgilityDevOps Requires Agility
DevOps Requires Agility
Stephen Ritchie
 
The Journey of devops and continuous delivery in a Large Financial Institution
The Journey of devops and continuous delivery in a Large Financial InstitutionThe Journey of devops and continuous delivery in a Large Financial Institution
The Journey of devops and continuous delivery in a Large Financial Institution
Kris Buytaert
 
Run stuff, Deploy Stuff, Jax London 2017 Edition
Run stuff, Deploy Stuff, Jax London 2017 EditionRun stuff, Deploy Stuff, Jax London 2017 Edition
Run stuff, Deploy Stuff, Jax London 2017 Edition
Kris Buytaert
 
Docker In Production Now: Seattle Docker Meetup March 2015
Docker In Production Now: Seattle Docker Meetup March 2015Docker In Production Now: Seattle Docker Meetup March 2015
Docker In Production Now: Seattle Docker Meetup March 2015
Justin Clayton
 
SDLC & DevSecOps
SDLC & DevSecOpsSDLC & DevSecOps
SDLC & DevSecOps
Irina Kostina
 
Monitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code AgeMonitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code Age
Kris Buytaert
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
Kris Buytaert
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
Julien Pivotto
 
Devops, the future is here it's not evenly distributed yet
Devops, the future is here it's not evenly distributed yetDevops, the future is here it's not evenly distributed yet
Devops, the future is here it's not evenly distributed yet
Kris Buytaert
 
DevOps Picc12 Management Talk
DevOps Picc12 Management TalkDevOps Picc12 Management Talk
DevOps Picc12 Management Talk
Michael Rembetsy
 
Sustainable development of an organization -- LKFR14
Sustainable development of an organization -- LKFR14Sustainable development of an organization -- LKFR14
Sustainable development of an organization -- LKFR14
Lean Kanban France
 
From Waterfall to Agile: A ScrumMaster’s View
From Waterfall to Agile: A ScrumMaster’s ViewFrom Waterfall to Agile: A ScrumMaster’s View
From Waterfall to Agile: A ScrumMaster’s View
TechWell
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
Mike Brittain
 
The devops laboratory - 1 year later
The devops laboratory - 1 year laterThe devops laboratory - 1 year later
The devops laboratory - 1 year later
Javier Turégano Molina
 
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
Helping Ops Help You:  Development’s Role in Enabling Self-Service OperationsHelping Ops Help You:  Development’s Role in Enabling Self-Service Operations
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
Rundeck
 
Learn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLife
Learn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLifeLearn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLife
Learn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLife
Docker, Inc.
 
DevOps: 6 Steps to Go Faster, Build Better and Avoid Disaster
DevOps: 6 Steps to Go Faster, Build Better and Avoid DisasterDevOps: 6 Steps to Go Faster, Build Better and Avoid Disaster
DevOps: 6 Steps to Go Faster, Build Better and Avoid Disaster
SmartBear
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
Timothy Fitz
 

What's hot (18)

DevOps Requires Agility
DevOps Requires AgilityDevOps Requires Agility
DevOps Requires Agility
 
The Journey of devops and continuous delivery in a Large Financial Institution
The Journey of devops and continuous delivery in a Large Financial InstitutionThe Journey of devops and continuous delivery in a Large Financial Institution
The Journey of devops and continuous delivery in a Large Financial Institution
 
Run stuff, Deploy Stuff, Jax London 2017 Edition
Run stuff, Deploy Stuff, Jax London 2017 EditionRun stuff, Deploy Stuff, Jax London 2017 Edition
Run stuff, Deploy Stuff, Jax London 2017 Edition
 
Docker In Production Now: Seattle Docker Meetup March 2015
Docker In Production Now: Seattle Docker Meetup March 2015Docker In Production Now: Seattle Docker Meetup March 2015
Docker In Production Now: Seattle Docker Meetup March 2015
 
SDLC & DevSecOps
SDLC & DevSecOpsSDLC & DevSecOps
SDLC & DevSecOps
 
Monitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code AgeMonitoring Drupal In an Infrastructure as Code Age
Monitoring Drupal In an Infrastructure as Code Age
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Devops, the future is here it's not evenly distributed yet
Devops, the future is here it's not evenly distributed yetDevops, the future is here it's not evenly distributed yet
Devops, the future is here it's not evenly distributed yet
 
DevOps Picc12 Management Talk
DevOps Picc12 Management TalkDevOps Picc12 Management Talk
DevOps Picc12 Management Talk
 
Sustainable development of an organization -- LKFR14
Sustainable development of an organization -- LKFR14Sustainable development of an organization -- LKFR14
Sustainable development of an organization -- LKFR14
 
From Waterfall to Agile: A ScrumMaster’s View
From Waterfall to Agile: A ScrumMaster’s ViewFrom Waterfall to Agile: A ScrumMaster’s View
From Waterfall to Agile: A ScrumMaster’s View
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
The devops laboratory - 1 year later
The devops laboratory - 1 year laterThe devops laboratory - 1 year later
The devops laboratory - 1 year later
 
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
Helping Ops Help You:  Development’s Role in Enabling Self-Service OperationsHelping Ops Help You:  Development’s Role in Enabling Self-Service Operations
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
 
Learn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLife
Learn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLifeLearn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLife
Learn Fast, Fail Fast, Deliver Fast: The MOD Squad Way at MetLife
 
DevOps: 6 Steps to Go Faster, Build Better and Avoid Disaster
DevOps: 6 Steps to Go Faster, Build Better and Avoid DisasterDevOps: 6 Steps to Go Faster, Build Better and Avoid Disaster
DevOps: 6 Steps to Go Faster, Build Better and Avoid Disaster
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 

Similar to Embracing collaborative chaos

From devoops to devops
From devoops to devopsFrom devoops to devops
From devoops to devops
Kris Buytaert
 
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringRSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
Aaron Rinehart
 
OWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos EngineeringOWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos Engineering
Aaron Rinehart
 
Dev secops opsec, devsec, devops ?
Dev secops opsec, devsec, devops ?Dev secops opsec, devsec, devops ?
Dev secops opsec, devsec, devops ?
Kris Buytaert
 
How Product Managers Thrive in a DevOps World
How Product Managers Thrive in a DevOps WorldHow Product Managers Thrive in a DevOps World
How Product Managers Thrive in a DevOps World
Atlassian
 
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Burr Sutter
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Rundeck
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
Haggai Philip Zagury
 
AllTheTalks Security Chaos Engineering
AllTheTalks Security Chaos Engineering AllTheTalks Security Chaos Engineering
AllTheTalks Security Chaos Engineering
Aaron Rinehart
 
What DevOps Isn't
What DevOps Isn'tWhat DevOps Isn't
What DevOps Isn't
Frank Lamantia
 
Moby is killing your devops efforts
Moby is killing your devops effortsMoby is killing your devops efforts
Moby is killing your devops efforts
Kris Buytaert
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
Rundeck
 
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
Aaron Rinehart
 
Devops is dead, Long Live Devops
Devops is dead, Long Live DevopsDevops is dead, Long Live Devops
Devops is dead, Long Live Devops
Kris Buytaert
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
Rundeck
 
Chaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ MilesChaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ Miles
Russell Miles
 
ROOTS2011 Continuous Delivery
ROOTS2011 Continuous DeliveryROOTS2011 Continuous Delivery
ROOTS2011 Continuous Delivery
Ole Christian Rynning
 
Continuous Delivery
Continuous DeliveryContinuous Delivery
Continuous Delivery
Stein Inge Morisbak
 
Making Software for the Software Makers: How Atlassian Teams use Jira Software
Making Software for the Software Makers: How Atlassian Teams use Jira SoftwareMaking Software for the Software Makers: How Atlassian Teams use Jira Software
Making Software for the Software Makers: How Atlassian Teams use Jira Software
Atlassian
 
Continuous Infrastructure First
Continuous Infrastructure FirstContinuous Infrastructure First
Continuous Infrastructure First
Kris Buytaert
 

Similar to Embracing collaborative chaos (20)

From devoops to devops
From devoops to devopsFrom devoops to devops
From devoops to devops
 
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringRSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
 
OWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos EngineeringOWASP AppSec Global 2019 Security & Chaos Engineering
OWASP AppSec Global 2019 Security & Chaos Engineering
 
Dev secops opsec, devsec, devops ?
Dev secops opsec, devsec, devops ?Dev secops opsec, devsec, devops ?
Dev secops opsec, devsec, devops ?
 
How Product Managers Thrive in a DevOps World
How Product Managers Thrive in a DevOps WorldHow Product Managers Thrive in a DevOps World
How Product Managers Thrive in a DevOps World
 
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
 
AllTheTalks Security Chaos Engineering
AllTheTalks Security Chaos Engineering AllTheTalks Security Chaos Engineering
AllTheTalks Security Chaos Engineering
 
What DevOps Isn't
What DevOps Isn'tWhat DevOps Isn't
What DevOps Isn't
 
Moby is killing your devops efforts
Moby is killing your devops effortsMoby is killing your devops efforts
Moby is killing your devops efforts
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"
 
Devops is dead, Long Live Devops
Devops is dead, Long Live DevopsDevops is dead, Long Live Devops
Devops is dead, Long Live Devops
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Chaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ MilesChaos Engineering 101 by Russ Miles
Chaos Engineering 101 by Russ Miles
 
ROOTS2011 Continuous Delivery
ROOTS2011 Continuous DeliveryROOTS2011 Continuous Delivery
ROOTS2011 Continuous Delivery
 
Continuous Delivery
Continuous DeliveryContinuous Delivery
Continuous Delivery
 
Making Software for the Software Makers: How Atlassian Teams use Jira Software
Making Software for the Software Makers: How Atlassian Teams use Jira SoftwareMaking Software for the Software Makers: How Atlassian Teams use Jira Software
Making Software for the Software Makers: How Atlassian Teams use Jira Software
 
Continuous Infrastructure First
Continuous Infrastructure FirstContinuous Infrastructure First
Continuous Infrastructure First
 

More from Equal Experts

TRUST Framework Talk 2023-03-10.pptx
TRUST Framework Talk 2023-03-10.pptxTRUST Framework Talk 2023-03-10.pptx
TRUST Framework Talk 2023-03-10.pptx
Equal Experts
 
Will it matter if your child cannot code?
Will it matter if your child cannot code?Will it matter if your child cannot code?
Will it matter if your child cannot code?
Equal Experts
 
Platform Security IRL: Busting Buzzwords & Building Better
Platform Security IRL:  Busting Buzzwords & Building BetterPlatform Security IRL:  Busting Buzzwords & Building Better
Platform Security IRL: Busting Buzzwords & Building Better
Equal Experts
 
Software development practices & Infrastructure as Code - how well do they wo...
Software development practices & Infrastructure as Code - how well do they wo...Software development practices & Infrastructure as Code - how well do they wo...
Software development practices & Infrastructure as Code - how well do they wo...
Equal Experts
 
A Whole Team Approach to Quality in Continuous Delivery - Lisa Crispin
A Whole Team Approach to Quality in Continuous Delivery - Lisa CrispinA Whole Team Approach to Quality in Continuous Delivery - Lisa Crispin
A Whole Team Approach to Quality in Continuous Delivery - Lisa Crispin
Equal Experts
 
Secure Continuous Delivery
Secure Continuous DeliverySecure Continuous Delivery
Secure Continuous Delivery
Equal Experts
 
Smoothing the continuous delivery path a tale of two architectures - expert...
Smoothing the continuous delivery path   a tale of two architectures - expert...Smoothing the continuous delivery path   a tale of two architectures - expert...
Smoothing the continuous delivery path a tale of two architectures - expert...
Equal Experts
 
Design Systems: Designing out Waste, Designing in Consistency
Design Systems: Designing out Waste, Designing in ConsistencyDesign Systems: Designing out Waste, Designing in Consistency
Design Systems: Designing out Waste, Designing in Consistency
Equal Experts
 
Growing Together - software development in the Developing world
Growing Together - software development in the Developing worldGrowing Together - software development in the Developing world
Growing Together - software development in the Developing world
Equal Experts
 
Infrastructure - a journey from datacentres to cloud
Infrastructure - a journey from datacentres to cloudInfrastructure - a journey from datacentres to cloud
Infrastructure - a journey from datacentres to cloud
Equal Experts
 
Data Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down SyndromeData Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down Syndrome
Equal Experts
 
The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...
Equal Experts
 
Secrets of an agile transformation
Secrets of an agile transformationSecrets of an agile transformation
Secrets of an agile transformation
Equal Experts
 
Obstacles of Digital Transformation Evolution
Obstacles of Digital Transformation EvolutionObstacles of Digital Transformation Evolution
Obstacles of Digital Transformation Evolution
Equal Experts
 
Avoiding the security brick
Avoiding the security brickAvoiding the security brick
Avoiding the security brick
Equal Experts
 
Continuous Security
Continuous SecurityContinuous Security
Continuous Security
Equal Experts
 
Organising for Continuous Delivery
Organising for Continuous DeliveryOrganising for Continuous Delivery
Organising for Continuous Delivery
Equal Experts
 
Cracking passwords via common topologies
Cracking passwords via common topologiesCracking passwords via common topologies
Cracking passwords via common topologies
Equal Experts
 
Inception Phases - Handling Complexity
Inception Phases - Handling ComplexityInception Phases - Handling Complexity
Inception Phases - Handling Complexity
Equal Experts
 
Smoothing the Continuous Delivery Path - A Tale of Two Teams
Smoothing the Continuous Delivery Path - A Tale of Two TeamsSmoothing the Continuous Delivery Path - A Tale of Two Teams
Smoothing the Continuous Delivery Path - A Tale of Two Teams
Equal Experts
 

More from Equal Experts (20)

TRUST Framework Talk 2023-03-10.pptx
TRUST Framework Talk 2023-03-10.pptxTRUST Framework Talk 2023-03-10.pptx
TRUST Framework Talk 2023-03-10.pptx
 
Will it matter if your child cannot code?
Will it matter if your child cannot code?Will it matter if your child cannot code?
Will it matter if your child cannot code?
 
Platform Security IRL: Busting Buzzwords & Building Better
Platform Security IRL:  Busting Buzzwords & Building BetterPlatform Security IRL:  Busting Buzzwords & Building Better
Platform Security IRL: Busting Buzzwords & Building Better
 
Software development practices & Infrastructure as Code - how well do they wo...
Software development practices & Infrastructure as Code - how well do they wo...Software development practices & Infrastructure as Code - how well do they wo...
Software development practices & Infrastructure as Code - how well do they wo...
 
A Whole Team Approach to Quality in Continuous Delivery - Lisa Crispin
A Whole Team Approach to Quality in Continuous Delivery - Lisa CrispinA Whole Team Approach to Quality in Continuous Delivery - Lisa Crispin
A Whole Team Approach to Quality in Continuous Delivery - Lisa Crispin
 
Secure Continuous Delivery
Secure Continuous DeliverySecure Continuous Delivery
Secure Continuous Delivery
 
Smoothing the continuous delivery path a tale of two architectures - expert...
Smoothing the continuous delivery path   a tale of two architectures - expert...Smoothing the continuous delivery path   a tale of two architectures - expert...
Smoothing the continuous delivery path a tale of two architectures - expert...
 
Design Systems: Designing out Waste, Designing in Consistency
Design Systems: Designing out Waste, Designing in ConsistencyDesign Systems: Designing out Waste, Designing in Consistency
Design Systems: Designing out Waste, Designing in Consistency
 
Growing Together - software development in the Developing world
Growing Together - software development in the Developing worldGrowing Together - software development in the Developing world
Growing Together - software development in the Developing world
 
Infrastructure - a journey from datacentres to cloud
Infrastructure - a journey from datacentres to cloudInfrastructure - a journey from datacentres to cloud
Infrastructure - a journey from datacentres to cloud
 
Data Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down SyndromeData Science In Action: Prenatal Screening for Down Syndrome
Data Science In Action: Prenatal Screening for Down Syndrome
 
The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...
 
Secrets of an agile transformation
Secrets of an agile transformationSecrets of an agile transformation
Secrets of an agile transformation
 
Obstacles of Digital Transformation Evolution
Obstacles of Digital Transformation EvolutionObstacles of Digital Transformation Evolution
Obstacles of Digital Transformation Evolution
 
Avoiding the security brick
Avoiding the security brickAvoiding the security brick
Avoiding the security brick
 
Continuous Security
Continuous SecurityContinuous Security
Continuous Security
 
Organising for Continuous Delivery
Organising for Continuous DeliveryOrganising for Continuous Delivery
Organising for Continuous Delivery
 
Cracking passwords via common topologies
Cracking passwords via common topologiesCracking passwords via common topologies
Cracking passwords via common topologies
 
Inception Phases - Handling Complexity
Inception Phases - Handling ComplexityInception Phases - Handling Complexity
Inception Phases - Handling Complexity
 
Smoothing the Continuous Delivery Path - A Tale of Two Teams
Smoothing the Continuous Delivery Path - A Tale of Two TeamsSmoothing the Continuous Delivery Path - A Tale of Two Teams
Smoothing the Continuous Delivery Path - A Tale of Two Teams
 

Recently uploaded

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 

Recently uploaded (20)

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 

Embracing collaborative chaos

  • 1. Making Software. Better. Simple solutions to big business problems. Equal Experts is a network of talented, experienced, software consultants, specialising in agile delivery.
  • 2. Embracing collaborative chaos Running chaos days on large platforms Lyndsay Prewer | @equalexperts
  • 3. Photo by Darius Bashar on Unsplash What is chaos engineering and why should we care?
  • 4. Look at what I built today! Google Cloud Dataflow In the Smart Home Data Pipeline
  • 5. Operating on the edge of chaos http://bit.ly/2ZavoyP http://bit.ly/2QVeWzA “Two normally-benign misconfigurations, and a specific software bug, combined to initiate the outage”
  • 6. Predicting failure Google Cloud Dataflow In the Smart Home Data Pipeline ● How many component parts does your system have? ● How are they connected? ● How reliable is each part? ● How reliable are the connections? ● What happens when X fails?
  • 7. Addressing the risk of unexpected failure A B A B D C Z E G H F I ● Address risk by deliberate inducing failure ● Observe, reflect and improve ● Build resilience in (like quality) ● Think about production (and failure) all the time Simples Hard
  • 8. Chaos engineering approaches Manual In process Automated Unplanned
  • 9. Manual chaos ● Chaos Days ● AWS Game Days ● Change specific chaos
  • 10. ● Chaos monkey / Simian Army ● AWS spot instances / GCP Preemptible VMs ● Randomised pod killer Automated chaos
  • 11. In process chaos ● Part of normal engineering process ● Focus for all roles in the team ● Production thinking / building resilience in Product Owner Dev QA Dev Ops Focus on: Quality AND Production AND Resilience Define Build Explore Deploy
  • 12. Unplanned chaos ● Every day is a school day ● Handle incidents well ● Learn from incidents - post incident reviews ● AWS podcast: http://bit.ly/31oQfAf A B D C Z E G H F I
  • 13. How does it help? People ProcessProduct Knowledge Behaviour Expertise Managing incidents Learning from incidents Engineering approach Observability Simplification Alerting Runbooks Resilience
  • 14. Photo by Darius Bashar on Unsplash Running a Chaos Day - when and how?
  • 15. Our context Legacy systems X00 million internal requests (busiest day) X00 million log messages (busiest day) x850 microservices XXm Customers 60 Delivery teams ~1000 Microservices Loren ipsum caveat empor Loren ipsum caveat empor. Loren ipsum caveat empor. Loren ipsum caveat empor Loren ipsum caveat empor. Loren ipsum caveat empor Loren ipsum caveat empor. Loren ipsum caveat empor. Loren ipsum caveat empor Loren ipsum caveat empor. Loren ipsum caveat empor Loren ipsum caveat empor. Loren ipsum caveat empor. Loren ipsum caveat empor Loren ipsum caveat empor. 6 Platform teams (AWS PaaS)
  • 16. When were we ready for chaos? 2013 2014 Cloud Docker Scala Mongo ELK Fast growth (teams, services, traffic)
  • 17. When were we ready for chaos? 2013 2014 2015 2016 Cloud Docker Scala Mongo ELK Fast growth (teams, services, traffic) Multi active WIP Multi active
  • 18. When were we ready for chaos? 2013 2014 2015 2016 2017 2018 Cloud Docker Scala Mongo ELK Fast growth (teams, services, traffic) Multi active WIP Multi active More multi active (to AWS) Self serve deploys AWS Ready for Chaos
  • 19. When are you ready for chaos? Manual In process Automated Unplanned
  • 20. Photo by Darius Bashar on Unsplash Who, where and exactly how?
  • 21. Agents of chaos ● Virtual, closed team ● Draw from component teams ● Experts / veterans ● Highest bus factor
  • 22. Chaos scope - know thyself ● Know your architecture ● Know your steady state ● Know your constraints ○ What’s in your control? ○ What’s not? ○ What needs protecting? Loren ipsum caveat empor Loren ipsum caveat empor. Loren ipsum caveat empor. Loren ipsum caveat empor Loren ipsum caveat empor. X00 million internal requests (busiest day) X00 million log messages (busiest day)
  • 23. Chaos scope - trust the brains-storm http://bit.ly/2XzR7Q9
  • 24. Chaos scope - brainstorm, then plan the detail Team X Team Y Team Z
  • 25. Chaos scope - hack in amongst the chaos Team X Team Y Team Z
  • 26. Deciding where ● Production or closest to it ● Production (like) load ● Production (like) telemetry ● Decide the blast radius ● Decide comm’s channel(s) Production Staging QA Development
  • 27. Photo by Darius Bashar on Unsplash Execution
  • 28. Deciding when ● To warn or not ● What else is going on? ● It was just an ordinary day … ● Chaos cut-off
  • 29. Keep calm and chaos on (agents) ● Co-locate the agents ● Collaborate and coordinate well ● Time-box, cover ground ● (Self) document well
  • 30. Keep calm and chaos on (everyone else) ● Also (self) document well ● Pretend it’s Production on ● It was just an ordinary day ...
  • 31. Photo by Darius Bashar on Unsplash Retrospection
  • 32. Divide and conquer, then regroup ● Major on engineering improvements (people, process, product) ● Minor on chaos day improvements ● Component teams retro’s / incident reviews first ● Then team-of-teams retro People ProcessProduct Team X Team Y Team Z Team of teams
  • 33. What did we learn? ● Manage/limit the pain ● Start small ● Production is a tough step ● Production-like is also hard! ● Have fun!
  • 34. Photo by Darius Bashar on Unsplash What next?
  • 35. What’s your next chaos step? Manual In process Automated Unplanned ● Where are you at in the journey? ● What’s the next (baby) step? ● Need any help?
  • 36. Thank You United Kingdom +44 203 603 7830 helloUK@equalexperts.com Equal Experts UK Ltd 30 Brock Street London NW1 3FG India +91 20 6607 7763 helloIndia@equalexperts.com Equal Experts India Private Ltd Office No. 4-C Cerebrum IT Park No. B3 Kumar City, Kalyani Nagar Pune, 411006 Canada +1 403 775 4861 helloCanada@equalexperts.com Equal Experts Devices Inc 205 - 279 Midpark way S.E. T2X 1M2 Calgary, Alberta Portugal +351 211 378 414 helloPortugal@equalexperts.com Equal Experts Portugal Avenida Dom João II, Nº35 Edificio Infante 11ºA 1990-083 Parque das Nações Lisboa – Portugal Thank You USA +1 866-943-9737 helloUSA@equalexperts.com Equal Experts Inc 1460 Broadway New York NY 10036   LinkedIn linkedin.com/company/equal-experts Twitter @EqualExperts Web www.equalexperts.com