Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS Community Day 2019 - Business Driven Availability

70 views

Published on

Rolf Koski: talk in the AWS Community Day 2019 event in Copenhagen. Business driven availability - why SLAs are not an excuse for poor architectures.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

AWS Community Day 2019 - Business Driven Availability

  1. 1. 1
  2. 2. Rolf Koski – Business driven availability Scandic Copenhagen, 18 February 2019 2 AWS Community Day Nordics 2019
  3. 3. Who am I 3 Rolf Koski CTO Cybercom AWS Business Group rolf.koski@cybercom.com rolle therolle - “Guy with the sticker” - Cloud Advisor & Evangelist - Community Leader - AWS Partner Ambassador - Well-Architected Lead
  4. 4. Why SLAs are not an excuse for poor architectures Disclaimer: this presentation makes you ask questions more than it gives answers…
  5. 5. Everything Fails (so if you think you can have 100%, you are lying to yourself)
  6. 6. SL(A) ? Objective vs. Agreement
  7. 7. Service Level is not just nines
  8. 8. Service Level is not just nines • What service is provided • How it is supported • During which time service is to be provided • What performance is to be expected • What are responsibilities of agreement parties
  9. 9. Quick introduction to availability arithmetics
  10. 10. 99% 99% ~98% Aggregate availability – Series
  11. 11. 99% 99% 99,99% Aggregate availability – Parallel
  12. 12. 99% 99,98% Aggregate availability – Combination 99% 99% 99%
  13. 13. 99% 99,98% Aggregate availability – Partial failure 99% 99% 99% 20% failing
  14. 14. Resilient Design
  15. 15. Resilient Design • People • Application implementation • Network & Data architecture • Infrastructure
  16. 16. SLA Credits Suck (and they have no real business value whatsoever)
  17. 17. Example: S3 SLA Monthly Uptime Percentage Service Credit Percentage Equal to or greater than 99.0% but less than 99.9% 10% Less than 99.0% 25% In literal terms: For 1 TB of data which was unavailable for up to 7 hours and 12 minutes, you get service credits for $2.34
  18. 18. The Cost of Availability (and when enough is enough)
  19. 19. Total Cost of Service Level 19 Cost of breech Cost of service level target Number of 9’s Cost
  20. 20. Throwing finite amount of money at the problem does not make it go away – at least not nearly every time
  21. 21. So, how to decide what to optimize?
  22. 22. Analyze, classify & decide
  23. 23. Analysis • How much is loss/corruption of data worth to you • How much is downtime worth to you • How much is malicious breach worth to you • How much is your public image worth to you • How much are you willing to invest in advance • How much are you willing to set aside for corrective action • How much risk are you willing to accumulate in regards of legislation, compliance and similar
  24. 24. Classification
  25. 25. Classification • Business criticality • Data privacy / confidentiality • Availability • Consistency • Resiliency • Original or derivative
  26. 26. Everything is not equal
  27. 27. Take a look in the mirror (There is no-one other to blame)
  28. 28. Your most valuable availability metric is not probably in %
  29. 29. Amazon: 100 ms of extra load time caused a 1% drop in sales (Greg Linden). Google: 500 ms of extra load time caused 20% fewer searches (Marissa Mayer). Yahoo!: 400 ms of extra load time caused a 5–9% increase in the number of people who clicked “back” before the page even loaded (Nicole Sullivan).
  30. 30. It’s actually not IF it works, but HOW it works
  31. 31. Some real advise
  32. 32. Some real advise • Automation and deployment pipeline • Infrastructure as Code • Versioning and ability to roll back • Deployment scenarios (A/B, B/G, Canary) • Immutable infrastructure • Origin data vs. recomputable data • Feature flags and support partially failing • Multi-AZ, multiregion • Monitoring: shallow & deep
  33. 33. Humans fail too. (Actually, more than you’d like)
  34. 34. Who is responsible in the Cloud? (It’s You)
  35. 35. 35

×