SlideShare a Scribd company logo
1 of 33
Michael Richardson
Twitter: @Mr_SPB
1© 2011 Energized Work - www.energizedwork.com
Availability and Recoverability
So what is High Availability?
• Five 9s?
• No Single point of failure?
• Multiple Data Centre’s?
• Fault Tolerance?
• Load Balancing?
• Uptime?
2© 2012 Energized Work - www.energizedwork.com
The 9’s of Availability
3© 2012 Energized Work - www.energizedwork.com
9
9
The 9’s of Availability
4© 2012 Energized Work - www.energizedwork.com
Availability Downtime per Year
One nine (90%) 36.5 days
Two nines (99%) 3.65 days
Three nines (99.9%) 8.76 hours
Four nines (99.99%) 52.56 minutes
Five nines (99.999%) 5.26 minutes
Problem with the 9’s
5© 2012 Energized Work - www.energizedwork.com
• What do they mean?
• Guaranteed or just an SLA
• Multiplicity
(99.9% * 99.9% * 99.9% = 99.7%)
SLA availability numbers:
just aim to provide a level of
confidence in a website’s
service
6© 2012 Energized Work - www.energizedwork.com
No Single Point of
Failure (SPOF)
7© 2012 Energized Work - www.energizedwork.com
two of everything?
8© 2012 Energized Work - www.energizedwork.com
Start with this
9© 2012 Energized Work - www.energizedwork.com
Index.html
Users
End with this
10© 2012 Energized Work - www.energizedwork.com
WEB1
switch 1 switch 2
WEB2 APP1 APP2 DB1 DB2
Firewall 1 Firewall 2
Users
• It’s expensive ££
• Where do you draw the line?
• Are failures independent
• Can you guarantee No SPOF?
• Increased complexity
11© 2012 Energized Work - www.energizedwork.com
Problems with
eliminating SPOF
Problem: Data Centre’s Fail
12© 2012 Energized Work - www.energizedwork.com
Solution: Get a 2nd
Data Centre
13© 2012 Energized Work - www.energizedwork.com
Hot/Hot Multisite
14© 2012 Energized Work - www.energizedwork.com
• Full range of services available in
multiple locations.
• Easy to automate failover of sites
• Data Consistency is hard.
• Capacity Planning concerns
+
Hot/Warm Multisite
15© 2012 Energized Work - www.energizedwork.com
• Simpler than Hot/Hot
• Read/write ratio dependant
• Synchronous or Asynchronously
replicate data?
+
Hot/Cold Multisite
16© 2012 Energized Work - www.energizedwork.com
• Easy to setup
• Will it work?
• Can it be trusted?
• Cold site rapidly become stale
• Is it actually valuable?
+
DR Multisite
17© 2012 Energized Work - www.energizedwork.com
• Fingers crossed you never need it.
• How can/should you test it?
• Cloud?
+
Problems with Multiple sites
18© 2012 Energized Work - www.energizedwork.com
• ££ - it’s expensive
• Managing more systems
• Managing consistency of Data
• Managing Capacity
• Is it still fail proof?
• Unless you test it, it’s just a plan
19© 2012 Energized Work - www.energizedwork.com
We now have a
Complex System
• More redundancy and automation leads
to more complexity.
• More complexity often adds more
points of failure.
20© 2012 Energized Work - www.energizedwork.com
Complex Systems
Author: Dr. Richard Cook
21© 2012 Energized Work - www.energizedwork.com
“How Complex Systems fail”
• Catastrophe is always just around the
corner.
• Human Operators have dual roles.
• Change introduces new forms of failure
Failure and Recovery
22© 2012 Energized Work - www.energizedwork.com
Questions for the Customer
23© 2012 Energized Work - www.energizedwork.com
• What is the cost of downtime?
• What are the RTO and RPO?
24© 2012 Energized Work - www.energizedwork.com
RTO = Recovery Time Objective
RPO = Recovery Point Objective
Aggressive RTO & RPO is
expensive and has a
performance impact.
25© 2012 Energized Work - www.energizedwork.com
RTO / RPO example
26© 2012 Energized Work - www.energizedwork.com
problem
•Simple DB
•Business can tolerate up to 15 minutes
downtime
•10 minute window of data lose.
RTO / RPO example
27© 2012 Energized Work - www.energizedwork.com
Possible solution
1.Continuously replicate data to 2nd
host
2.Continue with nightly backups and also
copy DB transaction logs from the primary
host to another system.
So what’s more important?
28© 2012 Energized Work - www.energizedwork.com
Increasing Availability
Or
Reducing Recovery Time
29© 2012 Energized Work - www.energizedwork.com
MTBF
Or
MTTR
What about MTTD??
30© 2012 Energized Work - www.energizedwork.com
Answer?
It Depends
31© 2012 Energized Work - www.energizedwork.com
Failure is inevitable
32© 2012 Energized Work - www.energizedwork.com
Ask anyone
33© 2011 Energized Work - www.energizedwork.com
Thank you
The End
Twitter - @Mr_SPB

More Related Content

Similar to System Availability Talk

MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012Energized Work
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overviewsolarisyougood
 
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomMusings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomAmazon Web Services
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]Strangeloop
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenJeff Mace
 
Walmart pagespeed-slide
Walmart pagespeed-slideWalmart pagespeed-slide
Walmart pagespeed-slideBitsytask
 
Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013Cliff Crocker
 
Presentation virtualizing oracle unlocked enterprise wide benefits
Presentation   virtualizing oracle unlocked enterprise wide benefitsPresentation   virtualizing oracle unlocked enterprise wide benefits
Presentation virtualizing oracle unlocked enterprise wide benefitssolarisyourep
 
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and PredictionsO'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and PredictionsStrangeloop
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systemsHanMorten
 
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014James Charter
 
Executing the Digital Strategy
Executing the Digital StrategyExecuting the Digital Strategy
Executing the Digital StrategyBen Turner
 
Optimizing Browser Rendering
Optimizing Browser RenderingOptimizing Browser Rendering
Optimizing Browser Renderingmichael.labriola
 
How to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for ContinuityHow to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for Continuitymarketingunitrends
 
Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014Wolfgang Gottesheim
 
At bruxelles scaling agile - v1.5 slideshare
At bruxelles   scaling agile - v1.5 slideshareAt bruxelles   scaling agile - v1.5 slideshare
At bruxelles scaling agile - v1.5 slideshareHerve Lourdin
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Steve Poole
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration pptp6academy
 

Similar to System Availability Talk (20)

MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overview
 
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomMusings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
 
Walmart pagespeed-slide
Walmart pagespeed-slideWalmart pagespeed-slide
Walmart pagespeed-slide
 
Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013
 
Presentation virtualizing oracle unlocked enterprise wide benefits
Presentation   virtualizing oracle unlocked enterprise wide benefitsPresentation   virtualizing oracle unlocked enterprise wide benefits
Presentation virtualizing oracle unlocked enterprise wide benefits
 
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and PredictionsO'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systems
 
Why You Should Move to the Cloud
Why You Should Move to the CloudWhy You Should Move to the Cloud
Why You Should Move to the Cloud
 
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
 
Executing the Digital Strategy
Executing the Digital StrategyExecuting the Digital Strategy
Executing the Digital Strategy
 
Optimizing Browser Rendering
Optimizing Browser RenderingOptimizing Browser Rendering
Optimizing Browser Rendering
 
How to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for ContinuityHow to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for Continuity
 
Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014
 
At bruxelles scaling agile - v1.5 slideshare
At bruxelles   scaling agile - v1.5 slideshareAt bruxelles   scaling agile - v1.5 slideshare
At bruxelles scaling agile - v1.5 slideshare
 
Scaling CQ5
Scaling CQ5Scaling CQ5
Scaling CQ5
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
 

More from m_richardson

Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with boshm_richardson
 
bootstrapping containers with confd
bootstrapping containers with confdbootstrapping containers with confd
bootstrapping containers with confdm_richardson
 
Docker Service Registration and Discovery
Docker Service Registration and DiscoveryDocker Service Registration and Discovery
Docker Service Registration and Discoverym_richardson
 
Puppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdbPuppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdbm_richardson
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsm_richardson
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBm_richardson
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collidem_richardson
 
Chef - managing yours servers with Code
Chef - managing yours servers with CodeChef - managing yours servers with Code
Chef - managing yours servers with Codem_richardson
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 

More from m_richardson (9)

Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with bosh
 
bootstrapping containers with confd
bootstrapping containers with confdbootstrapping containers with confd
bootstrapping containers with confd
 
Docker Service Registration and Discovery
Docker Service Registration and DiscoveryDocker Service Registration and Discovery
Docker Service Registration and Discovery
 
Puppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdbPuppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdb
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 
Chef - managing yours servers with Code
Chef - managing yours servers with CodeChef - managing yours servers with Code
Chef - managing yours servers with Code
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

System Availability Talk

  • 1. Michael Richardson Twitter: @Mr_SPB 1© 2011 Energized Work - www.energizedwork.com Availability and Recoverability
  • 2. So what is High Availability? • Five 9s? • No Single point of failure? • Multiple Data Centre’s? • Fault Tolerance? • Load Balancing? • Uptime? 2© 2012 Energized Work - www.energizedwork.com
  • 3. The 9’s of Availability 3© 2012 Energized Work - www.energizedwork.com 9 9
  • 4. The 9’s of Availability 4© 2012 Energized Work - www.energizedwork.com Availability Downtime per Year One nine (90%) 36.5 days Two nines (99%) 3.65 days Three nines (99.9%) 8.76 hours Four nines (99.99%) 52.56 minutes Five nines (99.999%) 5.26 minutes
  • 5. Problem with the 9’s 5© 2012 Energized Work - www.energizedwork.com • What do they mean? • Guaranteed or just an SLA • Multiplicity (99.9% * 99.9% * 99.9% = 99.7%)
  • 6. SLA availability numbers: just aim to provide a level of confidence in a website’s service 6© 2012 Energized Work - www.energizedwork.com
  • 7. No Single Point of Failure (SPOF) 7© 2012 Energized Work - www.energizedwork.com
  • 8. two of everything? 8© 2012 Energized Work - www.energizedwork.com
  • 9. Start with this 9© 2012 Energized Work - www.energizedwork.com Index.html Users
  • 10. End with this 10© 2012 Energized Work - www.energizedwork.com WEB1 switch 1 switch 2 WEB2 APP1 APP2 DB1 DB2 Firewall 1 Firewall 2 Users
  • 11. • It’s expensive ££ • Where do you draw the line? • Are failures independent • Can you guarantee No SPOF? • Increased complexity 11© 2012 Energized Work - www.energizedwork.com Problems with eliminating SPOF
  • 12. Problem: Data Centre’s Fail 12© 2012 Energized Work - www.energizedwork.com
  • 13. Solution: Get a 2nd Data Centre 13© 2012 Energized Work - www.energizedwork.com
  • 14. Hot/Hot Multisite 14© 2012 Energized Work - www.energizedwork.com • Full range of services available in multiple locations. • Easy to automate failover of sites • Data Consistency is hard. • Capacity Planning concerns +
  • 15. Hot/Warm Multisite 15© 2012 Energized Work - www.energizedwork.com • Simpler than Hot/Hot • Read/write ratio dependant • Synchronous or Asynchronously replicate data? +
  • 16. Hot/Cold Multisite 16© 2012 Energized Work - www.energizedwork.com • Easy to setup • Will it work? • Can it be trusted? • Cold site rapidly become stale • Is it actually valuable? +
  • 17. DR Multisite 17© 2012 Energized Work - www.energizedwork.com • Fingers crossed you never need it. • How can/should you test it? • Cloud? +
  • 18. Problems with Multiple sites 18© 2012 Energized Work - www.energizedwork.com • ££ - it’s expensive • Managing more systems • Managing consistency of Data • Managing Capacity • Is it still fail proof? • Unless you test it, it’s just a plan
  • 19. 19© 2012 Energized Work - www.energizedwork.com We now have a Complex System
  • 20. • More redundancy and automation leads to more complexity. • More complexity often adds more points of failure. 20© 2012 Energized Work - www.energizedwork.com Complex Systems
  • 21. Author: Dr. Richard Cook 21© 2012 Energized Work - www.energizedwork.com “How Complex Systems fail” • Catastrophe is always just around the corner. • Human Operators have dual roles. • Change introduces new forms of failure
  • 22. Failure and Recovery 22© 2012 Energized Work - www.energizedwork.com
  • 23. Questions for the Customer 23© 2012 Energized Work - www.energizedwork.com • What is the cost of downtime? • What are the RTO and RPO?
  • 24. 24© 2012 Energized Work - www.energizedwork.com RTO = Recovery Time Objective RPO = Recovery Point Objective
  • 25. Aggressive RTO & RPO is expensive and has a performance impact. 25© 2012 Energized Work - www.energizedwork.com
  • 26. RTO / RPO example 26© 2012 Energized Work - www.energizedwork.com problem •Simple DB •Business can tolerate up to 15 minutes downtime •10 minute window of data lose.
  • 27. RTO / RPO example 27© 2012 Energized Work - www.energizedwork.com Possible solution 1.Continuously replicate data to 2nd host 2.Continue with nightly backups and also copy DB transaction logs from the primary host to another system.
  • 28. So what’s more important? 28© 2012 Energized Work - www.energizedwork.com Increasing Availability Or Reducing Recovery Time
  • 29. 29© 2012 Energized Work - www.energizedwork.com MTBF Or MTTR What about MTTD??
  • 30. 30© 2012 Energized Work - www.energizedwork.com Answer? It Depends
  • 31. 31© 2012 Energized Work - www.energizedwork.com Failure is inevitable
  • 32. 32© 2012 Energized Work - www.energizedwork.com Ask anyone
  • 33. 33© 2011 Energized Work - www.energizedwork.com Thank you The End Twitter - @Mr_SPB

Editor's Notes

  1. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  2. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  3. Ask any business how much downtime is acceptable and you will get a consistent answer. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  4. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  5. Found more in Marketing literature than technical literature 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  6. An SLA is just an instrument that makes business people comfortable (just like insurance) 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  7. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  8. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  9. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  10. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  11. 1 & 2 Diminishing returns Paradoxically, adding more components to an overall system design can undermine efforts to achieve high availability Cascading failures 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  12. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  13. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  14. Read & Write anywhere Global Server Load Balancing with DNS 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  15. Read intensive apps are well suited to this – Reads Hot/Hot 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  16. Cold site is so untrusted that perhaps spending hours restoring the primary DC is a better and safer bet. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  17. Cold site is so untrusted that perhaps spending hours restoring the primary DC is a better and safer bet. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  18. Talk about capacity planning Hot/Hot – config switches Most companies don ’ t thoroughly test DC failover. When failure occurs many companies will often focus on restoring the failure in the primary DC rather attempt a failover. So why bother having a 2 nd DC anyway. If you plan on having multiple DC ’ s or DR then test your procedures when you ’ re not in an emergency situation. Game Day events 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  19. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  20. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  21. Mention John Alspaw ’ s Qcon talk 2. Dual roles of humans Defenders against failure Producers of failure 3. Introduce a technology change To prevent low-consequence, but high frequency failures May introduce low frequency, but high consequence failure Introduce new pathways to large-scale, catastrophic failures. Focus of humans is on the beneficial charactistics of the change. New failure ’ s maybe difficult to foresee. Give config management example Knife Resolv.conf 3. Also covers maintenance and why many find it difficult. Build and forget mentality. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  22. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  23. Cost of downtime – easy or difficult to measure Can downtime actually be equated to lost revenue. Give online shopping example 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  24. RTO and RPO are often in competition Give eg of replication lag between 2 sites. Zero RPO example - If replication lags between systems and you have an aggressive RPO you maybe better off taking a few hours outage and focusing on restoring your primary site. Zero RTO example – if replication lags between DC ’ s you may decide to failover immediately and take the data loss for some inflight transactions Aggressive RTO & RPO is expensive and has a performance 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  25. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  26. Typical nightly backups aren ’ t going to cut it. Common practice is to backup systems nightly. Is your business happy to lose up to 24 hours of data? Probably not. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  27. Covers you for any catastrophic hardware failure 2 nd host has independent storage infrastructure. Data corruption would however result in 2 copies of crap 2. Covers you for data corruption Playing back transaction logs will also allow you to identify the place where corruption occurred. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  28. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  29. What about MTTD? 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  30. My experience tells me most companies focus on availability How many companies take nightly tape backups but have never bothered trying to restore or test them? If you think you can built a completely fail-proof system you are kidding yourself. How many companies have game days? 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  31. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  32. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  33. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING