Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dependable Cloud Architecture - SWOCC Edition

1,234 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Dependable Cloud Architecture - SWOCC Edition

  1. 1. Dependable Cloud Architecture Image: xkcd.com mwood@cerebrata.com http://mvwood.com @mikewo on Twitter
  2. 2. “Failure is always an option.” Image: Discovery Channel, Fair Use
  3. 3. What are we looking for? Hardware Failure Data Corruption Network Failure Loss of Facilities Check out: http://bit.ly/wazbizcont Images: Office ClipArt & Godzilla Releasing Corp (Fair Use)
  4. 4. Human Error Image: FOX, Fair Use
  5. 5. What we’re trying to achieve 1. Monitoring 2. Resilient Solutions Image: Cohdra
  6. 6. Cost vs Risk 99.999 % $1, … ,000.00 To get more 9’s here add more 0’s here. Image: Office ClipArt
  7. 7. Image: NASA
  8. 8. Functional Transparency Logging Messages Hardware Health Dependent Services Health Image: Office ClipArt
  9. 9. Telemetry
  10. 10. Analyze your Data Image: NASA
  11. 11. Image: Office ClipArt
  12. 12. Remember: Failure is always an option. Common Points of Failure • Machineapplication crashes • Throttling (exceeding capacity) • ConnectivityNetwork • External service dependencies
  13. 13. Try/catch != Resilient private void createFile() { string fileName = @"c:workingDirectorysomeFileName.txt"; try { File.Create(fileName); } catch (DirectoryNotFoundException ex) { Trace.WriteLine( String.Format("Unable to create {0}. {1}", fileName, ex)); throw; } } }
  14. 14. Image: Michael Wood Decompose your system…
  15. 15. Capacity Buffering Content Delivery Networks (CDN’s) Distributed Application Cache Local Content Cache Enables recovery during outages or spikes in load Image: jepler
  16. 16. Always carry a spare 0% Capacity, half of all load 75%Capacity, redirectourload 100%Capacity, 150% Capacity 75% of load, half of our load SYSTEM FAILURE!!! 50% more capacity then needed Over allocated, but still functioning • Can absorb but don’t fail spikes • Degrade, of temporary • Time to react if need to add capacity Image: Kevin Rosseel
  17. 17. Request Buffering Queues Retry Policies Async Workloads Image: Joe Shlabotnik
  18. 18. Dept. of Redundancy Dept. Have a backup, somewhere else More than one? Cost to benefit Ratio? Ready State Hot = full capacity Warm = scaled down, but ready to grow Cold = mothballed, starts from zero Image: Mr. White
  19. 19. Redundancy - Its about probability 95% uptime 95% uptime 95% uptime 1 box : 5% downtime or 438hrs per year 95% uptime (that’s 18 ½ days!) 2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year 4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000,000 0.000625% downtime or 3.285 MINUTES per year
  20. 20. Total Outage duration = Time to Detect + Time to Diagnose + Time to Decide + Time to Act Image: Office ClipArt
  21. 21. Dynamic Addressing & Configuration
  22. 22. What about your data? Image: barrymieny
  23. 23. Image: Michael Wood Availability via Degradation
  24. 24. Virtualization and Automation Images: Gizmodo
  25. 25. Images: Orion Pictures owns Terminator Franchise
  26. 26. The “HI” Point Images: Office Clip Art
  27. 27. Image: NASA
  28. 28. “Don't be too proud of this technological terror you've constructed…” DO: • Root cause analysis • Read other root cause analysis • Plan for failure ADMIT: • Your Solution WILL fail at some point • You can learn from others just as well as yourself DON’T: • Get cocky • Stick your head in the sand Images: LucasFilm, Fair Use

×