SlideShare a Scribd company logo
Jorge Salamero Sanz <jsalamero@serverdensity.com>
CfgMgmtCamp 1 Feb 2016
War Games - Flight training for DevOps
How to Monitor MySQL
The Cost of Uptime
$ 3.55bn 2015 Q4
$ 1.21bn 2015 Q2
$ 4.1bn 2015 Q1
How much do you spend?
● Infrastructure automation
● Configuration automation
● Continuous testing
● Continuous deployment / delivery
● Monitoring
● Logs, error handling
● Feedback
● Human Ops
DevOps lifecycle
● Prepare
● Respond
● Postmortem
Expect downtime
● Power failure to half of our servers
● Automated failover unavailable
(known failure condition)
● Manual DNS switch required
● Expected impact: 20 min
● Actual impact: 43min
Incident example
● Unfamiliarity with the process
● Pressure of time sensitive event
(panic effect)
● Escalation introduces delays
The Human Factor
● Extended use of checklists
● Not to follow blindly, use knowledge
and experience
● Independent system
● Searchable
● List of known issues and
documented workarounds/fixes
Documented procedures
● Realistic incident simulation
● Practice general response process
● Practice specific incident response
● Deficiencies: practice and improve
the process
Practiced procedures
● First responder, acknowledge alert
● Load incident response checklist
● Log into #ops-war-room in Slack
● Log incident into JIRA
● Begin investigation
General response process
● The “limits of human memory and
attention”
○ Complexity
○ Stress and fatigue
○ Ego
● Pilots, doctors, divers:
Bruce Willis Ruins All Films
(BCD, weights, releases, air, final)
Pre-flight checklists
● Increase confidence
● Reduce panic
● Better coordination
● Trust relationships
● Improves time to resolution
Humans
● Replica environment
● or mock command line
● Record actions and timing
● Multiple failures
● Unexpected results
Realistic scenarios
● Team and individual test of response
● Run real commands
● Training the people
● Training the procedures
● Training the tools
Simulation goals
● Objective review
● Suggestions for improvements
● Do it again
● Scenario evolves
● People forget
loop(): review and repeat
● Failure sucks
● Fearless, blameless
● Significant learning
● Restores confidence
● Increases credibility
Postmortem
● Short regular updates
● Even “we’re still looking into it”
● ~1 week to publish full version
○ follow-up incidents
○ check with 3rd party providers
○ timeline for required changes
Postmortem Timing
● Root cause
● Turn of event led to failure
● Steps to identify & isolate the cause
● Services affected
● How we fixed it
● What we have learned and changed
Postmortem Content
Jorge Salamero Sanz
Chief Developer Advocate
@bencerillo
@serverdensity
our DevOps stories, no product spam
blog.serverdensity.com

More Related Content

Similar to Flight training for DevOps

PEX Week: iDatix Workshop Part 3
PEX Week: iDatix Workshop Part 3PEX Week: iDatix Workshop Part 3
PEX Week: iDatix Workshop Part 3
iDatix
 
Put "fast" back in "fast feedback"
Put "fast" back in "fast feedback"Put "fast" back in "fast feedback"
Put "fast" back in "fast feedback"
Lars Thorup
 
Data Integrity - Patryk Hes
Data Integrity - Patryk HesData Integrity - Patryk Hes
Data Integrity - Patryk Hes
PROIDEA
 
Winston - Netflix's event driven auto remediation and diagnostics tool
Winston - Netflix's event driven auto remediation and diagnostics toolWinston - Netflix's event driven auto remediation and diagnostics tool
Winston - Netflix's event driven auto remediation and diagnostics tool
Vinay Shah
 
WSO2Con ASIA 2016: Getting More 9s from Your Deployment
WSO2Con ASIA 2016: Getting More 9s from Your DeploymentWSO2Con ASIA 2016: Getting More 9s from Your Deployment
WSO2Con ASIA 2016: Getting More 9s from Your Deployment
WSO2
 
Performance testing with jmeter
Performance testing with jmeter Performance testing with jmeter
Performance testing with jmeter
Knoldus Inc.
 
MidCamp 2014 - A Perfect Launch, Every Time
MidCamp 2014 - A Perfect Launch, Every TimeMidCamp 2014 - A Perfect Launch, Every Time
MidCamp 2014 - A Perfect Launch, Every Time
Suzanne Aldrich
 
3 Keys to Performance Testing at the Speed of Agile
3 Keys to Performance Testing at the Speed of Agile3 Keys to Performance Testing at the Speed of Agile
3 Keys to Performance Testing at the Speed of Agile
Neotys
 
Monitoring &amp; alerting presentation sabin&amp;mustafa
Monitoring &amp; alerting presentation sabin&amp;mustafaMonitoring &amp; alerting presentation sabin&amp;mustafa
Monitoring &amp; alerting presentation sabin&amp;mustafa
Lama K Banna
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
confluent
 
HowTo DR
HowTo DRHowTo DR
Moodle at scale why assigning a role can cause a catastrophe
Moodle at scale   why assigning a role can cause a catastropheMoodle at scale   why assigning a role can cause a catastrophe
Moodle at scale why assigning a role can cause a catastrophe
sammarshall_ou
 
Break Up the Monolith- Testing Microservices by Marcus Merrell
Break Up the Monolith- Testing Microservices by Marcus MerrellBreak Up the Monolith- Testing Microservices by Marcus Merrell
Break Up the Monolith- Testing Microservices by Marcus Merrell
Sauce Labs
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterprise
Christian McHugh
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
slandelle
 
Practical DevSecOps: Fundamentals of Successful Programs
Practical DevSecOps: Fundamentals of Successful ProgramsPractical DevSecOps: Fundamentals of Successful Programs
Practical DevSecOps: Fundamentals of Successful Programs
Matt Tesauro
 
8.6 Fine Tuning all of Your Salesforce Instruments and Processes
8.6 Fine Tuning all of Your Salesforce Instruments and Processes8.6 Fine Tuning all of Your Salesforce Instruments and Processes
8.6 Fine Tuning all of Your Salesforce Instruments and Processes
TargetX
 
Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015
Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015
Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015
Cory Scott
 
8220 sad inquiry
8220 sad inquiry8220 sad inquiry
8220 sad inquiry
bbass03
 
Елена Панина - Drupal performance testing. Тестирование производительности, м...
Елена Панина - Drupal performance testing. Тестирование производительности, м...Елена Панина - Drupal performance testing. Тестирование производительности, м...
Елена Панина - Drupal performance testing. Тестирование производительности, м...
LEDC 2016
 

Similar to Flight training for DevOps (20)

PEX Week: iDatix Workshop Part 3
PEX Week: iDatix Workshop Part 3PEX Week: iDatix Workshop Part 3
PEX Week: iDatix Workshop Part 3
 
Put "fast" back in "fast feedback"
Put "fast" back in "fast feedback"Put "fast" back in "fast feedback"
Put "fast" back in "fast feedback"
 
Data Integrity - Patryk Hes
Data Integrity - Patryk HesData Integrity - Patryk Hes
Data Integrity - Patryk Hes
 
Winston - Netflix's event driven auto remediation and diagnostics tool
Winston - Netflix's event driven auto remediation and diagnostics toolWinston - Netflix's event driven auto remediation and diagnostics tool
Winston - Netflix's event driven auto remediation and diagnostics tool
 
WSO2Con ASIA 2016: Getting More 9s from Your Deployment
WSO2Con ASIA 2016: Getting More 9s from Your DeploymentWSO2Con ASIA 2016: Getting More 9s from Your Deployment
WSO2Con ASIA 2016: Getting More 9s from Your Deployment
 
Performance testing with jmeter
Performance testing with jmeter Performance testing with jmeter
Performance testing with jmeter
 
MidCamp 2014 - A Perfect Launch, Every Time
MidCamp 2014 - A Perfect Launch, Every TimeMidCamp 2014 - A Perfect Launch, Every Time
MidCamp 2014 - A Perfect Launch, Every Time
 
3 Keys to Performance Testing at the Speed of Agile
3 Keys to Performance Testing at the Speed of Agile3 Keys to Performance Testing at the Speed of Agile
3 Keys to Performance Testing at the Speed of Agile
 
Monitoring &amp; alerting presentation sabin&amp;mustafa
Monitoring &amp; alerting presentation sabin&amp;mustafaMonitoring &amp; alerting presentation sabin&amp;mustafa
Monitoring &amp; alerting presentation sabin&amp;mustafa
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
Moodle at scale why assigning a role can cause a catastrophe
Moodle at scale   why assigning a role can cause a catastropheMoodle at scale   why assigning a role can cause a catastrophe
Moodle at scale why assigning a role can cause a catastrophe
 
Break Up the Monolith- Testing Microservices by Marcus Merrell
Break Up the Monolith- Testing Microservices by Marcus MerrellBreak Up the Monolith- Testing Microservices by Marcus Merrell
Break Up the Monolith- Testing Microservices by Marcus Merrell
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterprise
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Practical DevSecOps: Fundamentals of Successful Programs
Practical DevSecOps: Fundamentals of Successful ProgramsPractical DevSecOps: Fundamentals of Successful Programs
Practical DevSecOps: Fundamentals of Successful Programs
 
8.6 Fine Tuning all of Your Salesforce Instruments and Processes
8.6 Fine Tuning all of Your Salesforce Instruments and Processes8.6 Fine Tuning all of Your Salesforce Instruments and Processes
8.6 Fine Tuning all of Your Salesforce Instruments and Processes
 
Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015
Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015
Tactical Application Security: Getting Stuff Done - Black Hat Briefings 2015
 
8220 sad inquiry
8220 sad inquiry8220 sad inquiry
8220 sad inquiry
 
Елена Панина - Drupal performance testing. Тестирование производительности, м...
Елена Панина - Drupal performance testing. Тестирование производительности, м...Елена Панина - Drupal performance testing. Тестирование производительности, м...
Елена Панина - Drupal performance testing. Тестирование производительности, м...
 

More from Server Density

Content marketing @ Server Density
Content marketing @ Server DensityContent marketing @ Server Density
Content marketing @ Server Density
Server Density
 
How to Monitor MySQL
How to Monitor MySQLHow to Monitor MySQL
How to Monitor MySQL
Server Density
 
Handling incidents
Handling incidentsHandling incidents
Handling incidents
Server Density
 
Scaling humans - Ops teams and incident management
Scaling humans - Ops teams and incident managementScaling humans - Ops teams and incident management
Scaling humans - Ops teams and incident management
Server Density
 
Briefing: Containers
Briefing: ContainersBriefing: Containers
Briefing: Containers
Server Density
 
Why puppet? Why now?
Why puppet? Why now?Why puppet? Why now?
Why puppet? Why now?
Server Density
 
Infrastructure choices - cloud vs colo vs bare metal
Infrastructure choices - cloud vs colo vs bare metalInfrastructure choices - cloud vs colo vs bare metal
Infrastructure choices - cloud vs colo vs bare metal
Server Density
 
Navigating the customer lifecycle
Navigating the customer lifecycleNavigating the customer lifecycle
Navigating the customer lifecycle
Server Density
 
Experiences from DevOps production: Deployment, performance, failure.
Experiences from DevOps production: Deployment, performance, failure.Experiences from DevOps production: Deployment, performance, failure.
Experiences from DevOps production: Deployment, performance, failure.
Server Density
 
DevOps Incident Handling - Making friends not enemies.
DevOps Incident Handling - Making friends not enemies.DevOps Incident Handling - Making friends not enemies.
DevOps Incident Handling - Making friends not enemies.
Server Density
 
How to monitor NGINX
How to monitor NGINXHow to monitor NGINX
How to monitor NGINX
Server Density
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
Server Density
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
Server Density
 
Puppet at the centre of everything
Puppet at the centre of everythingPuppet at the centre of everything
Puppet at the centre of everything
Server Density
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013
Server Density
 
Remote startup - building a company from everywhere in the world
Remote startup - building a company from everywhere in the worldRemote startup - building a company from everywhere in the world
Remote startup - building a company from everywhere in the world
Server Density
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
Server Density
 
StartOps: Growing an ops team from 1 founder
StartOps: Growing an ops team from 1 founderStartOps: Growing an ops team from 1 founder
StartOps: Growing an ops team from 1 founder
Server Density
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
Puppet Camp Ghent 2013
Puppet Camp Ghent 2013Puppet Camp Ghent 2013
Puppet Camp Ghent 2013
Server Density
 

More from Server Density (20)

Content marketing @ Server Density
Content marketing @ Server DensityContent marketing @ Server Density
Content marketing @ Server Density
 
How to Monitor MySQL
How to Monitor MySQLHow to Monitor MySQL
How to Monitor MySQL
 
Handling incidents
Handling incidentsHandling incidents
Handling incidents
 
Scaling humans - Ops teams and incident management
Scaling humans - Ops teams and incident managementScaling humans - Ops teams and incident management
Scaling humans - Ops teams and incident management
 
Briefing: Containers
Briefing: ContainersBriefing: Containers
Briefing: Containers
 
Why puppet? Why now?
Why puppet? Why now?Why puppet? Why now?
Why puppet? Why now?
 
Infrastructure choices - cloud vs colo vs bare metal
Infrastructure choices - cloud vs colo vs bare metalInfrastructure choices - cloud vs colo vs bare metal
Infrastructure choices - cloud vs colo vs bare metal
 
Navigating the customer lifecycle
Navigating the customer lifecycleNavigating the customer lifecycle
Navigating the customer lifecycle
 
Experiences from DevOps production: Deployment, performance, failure.
Experiences from DevOps production: Deployment, performance, failure.Experiences from DevOps production: Deployment, performance, failure.
Experiences from DevOps production: Deployment, performance, failure.
 
DevOps Incident Handling - Making friends not enemies.
DevOps Incident Handling - Making friends not enemies.DevOps Incident Handling - Making friends not enemies.
DevOps Incident Handling - Making friends not enemies.
 
How to monitor NGINX
How to monitor NGINXHow to monitor NGINX
How to monitor NGINX
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
Puppet at the centre of everything
Puppet at the centre of everythingPuppet at the centre of everything
Puppet at the centre of everything
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013
 
Remote startup - building a company from everywhere in the world
Remote startup - building a company from everywhere in the worldRemote startup - building a company from everywhere in the world
Remote startup - building a company from everywhere in the world
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
 
StartOps: Growing an ops team from 1 founder
StartOps: Growing an ops team from 1 founderStartOps: Growing an ops team from 1 founder
StartOps: Growing an ops team from 1 founder
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Puppet Camp Ghent 2013
Puppet Camp Ghent 2013Puppet Camp Ghent 2013
Puppet Camp Ghent 2013
 

Recently uploaded

4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 

Recently uploaded (20)

4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 

Flight training for DevOps

  • 1. Jorge Salamero Sanz <jsalamero@serverdensity.com> CfgMgmtCamp 1 Feb 2016 War Games - Flight training for DevOps
  • 3. The Cost of Uptime $ 3.55bn 2015 Q4 $ 1.21bn 2015 Q2 $ 4.1bn 2015 Q1
  • 4. How much do you spend?
  • 5. ● Infrastructure automation ● Configuration automation ● Continuous testing ● Continuous deployment / delivery ● Monitoring ● Logs, error handling ● Feedback ● Human Ops DevOps lifecycle
  • 6. ● Prepare ● Respond ● Postmortem Expect downtime
  • 7.
  • 8. ● Power failure to half of our servers ● Automated failover unavailable (known failure condition) ● Manual DNS switch required ● Expected impact: 20 min ● Actual impact: 43min Incident example
  • 9. ● Unfamiliarity with the process ● Pressure of time sensitive event (panic effect) ● Escalation introduces delays The Human Factor
  • 10. ● Extended use of checklists ● Not to follow blindly, use knowledge and experience ● Independent system ● Searchable ● List of known issues and documented workarounds/fixes Documented procedures
  • 11. ● Realistic incident simulation ● Practice general response process ● Practice specific incident response ● Deficiencies: practice and improve the process Practiced procedures
  • 12. ● First responder, acknowledge alert ● Load incident response checklist ● Log into #ops-war-room in Slack ● Log incident into JIRA ● Begin investigation General response process
  • 13. ● The “limits of human memory and attention” ○ Complexity ○ Stress and fatigue ○ Ego ● Pilots, doctors, divers: Bruce Willis Ruins All Films (BCD, weights, releases, air, final) Pre-flight checklists
  • 14.
  • 15. ● Increase confidence ● Reduce panic ● Better coordination ● Trust relationships ● Improves time to resolution Humans
  • 16. ● Replica environment ● or mock command line ● Record actions and timing ● Multiple failures ● Unexpected results Realistic scenarios
  • 17. ● Team and individual test of response ● Run real commands ● Training the people ● Training the procedures ● Training the tools Simulation goals
  • 18. ● Objective review ● Suggestions for improvements ● Do it again ● Scenario evolves ● People forget loop(): review and repeat
  • 19. ● Failure sucks ● Fearless, blameless ● Significant learning ● Restores confidence ● Increases credibility Postmortem
  • 20. ● Short regular updates ● Even “we’re still looking into it” ● ~1 week to publish full version ○ follow-up incidents ○ check with 3rd party providers ○ timeline for required changes Postmortem Timing
  • 21. ● Root cause ● Turn of event led to failure ● Steps to identify & isolate the cause ● Services affected ● How we fixed it ● What we have learned and changed Postmortem Content
  • 22. Jorge Salamero Sanz Chief Developer Advocate @bencerillo @serverdensity our DevOps stories, no product spam blog.serverdensity.com