Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

5 keys to high availability applications

414 views

Published on

Slides from Lee Atchison's Webinar at O'Reilly in February 2016.

  • Be the first to comment

  • Be the first to like this

5 keys to high availability applications

  1. 1. 5 Keys to Building High Availability Web Applications for Service and Microservice BasedSystems LeeAtchison, Principal CloudArchitect andAdvocate Confidential ©2008–16 New Relic, Inc. All rights reserved.
  2. 2. Confidential ©2008–16 New Relic, Inc. All rights reserved.
  3. 3. Confidential ©2008–16 New Relic, Inc. All rights reserved.
  4. 4. Confidential ©2008–16 New Relic, Inc. All rights reserved.
  5. 5. You had power most of the time. Why are you complaining? Confidential ©2008–16 New Relic, Inc. All rights reserved.
  6. 6. How do you keep an application operational? Confidential ©2008–16 New Relic, Inc. All rights reserved.
  7. 7. 5 Keys to High Availability WebApplications Confidential ©2008–16 New Relic, Inc. All rights reserved. Key 4Key 3Key 2Key 1 Key 5
  8. 8. Key 4Key 3Key 2Key 1 Build applications keeping availability inmind Key 5 Confidential ©2008–16 New Relic, Inc. All rights reserved. OR Develop for failure
  9. 9. Confidential ©2008–16 New Relic, Inc. All rights reserved. Services will fail
  10. 10. Confidential ©2008–16 New Relic, Inc. All rights reserved. … always. Services will fail
  11. 11. As a Service Developer… Your response to a dependency failure must be Confidential ©2008–16 New Relic, Inc. All rights reserved.
  12. 12. As a Service Developer… Your response to a dependency failure must be Understandable Confidential ©2008–16 New Relic, Inc. All rights reserved.
  13. 13. As a Service Developer… Your response to a dependency failure must be Predictable Understandable Confidential ©2008–16 New Relic, Inc. All rights reserved.
  14. 14. As a Service Developer… Your response to a dependency failure must be Predictable Reasonable for the given dependency failure Understandable Confidential ©2008–16 New Relic, Inc. All rights reserved.
  15. 15. How should I respond when a dependency fails? Don’t know something? Don’t show it! § Don’t show a drop down list of accounts if you can’t contact the account service § Don’t show an image (or show a placeholder) if you can’t determine which image to showProvidea graceful backoff Confidential ©2008–16 New Relic, Inc. All rights reserved.
  16. 16. 16 Example (Real Life) Our web application showing a page… One day, that 3rd party system failed An avatar was representing the customer on each page The app didn’t know what to do – so it failed, too A 3rd party system generated the avatar Our applicationwas completelydown, all because ofa minoricon missing...
  17. 17. 17 Why did this cause your application to fail? § Recognizedthe failureof the 3rd party provider as soon as possible § Substitutea generic image (or removed it) when the servicefailurewas detected § Circuit Breaker patternwould help a lot here It didn’t know how to respond. It could have:
  18. 18. How should I respond when a dependency fails? Fail as early as possible: § Don’t propagate bad data… once you determine a piece of data is invalid, discard it as soon as possible § Validate input given… reject bad input immediatelyProvidea graceful backoff Confidential ©2008–16 New Relic, Inc. All rights reserved.
  19. 19. 19 Example (Real Life) Accountservice was havingperformanceproblems… Customers felt a performance problem Someone was sending bad requests 400 System had “browned out” 0 Service tried to process the request… (And eventually failed)
  20. 20. 20 So, what brought our application to its knees? § Input to the servicewas obviously bad § Yet, we attempted to use the input § Result was a failedservice
  21. 21. The Lesson Confidential ©2008–16 New Relic, Inc. All rights reserved.
  22. 22. Key 4Key 3Key 2Key 1 Always think about scaling OR Just because your application works now does not mean it will work tomorrow… Key 5 Confidential ©2008–16 New Relic, Inc. All rights reserved. Build applications keeping availability inmind OR Develop for failure
  23. 23. Just because your application works now does not mean it will work tomorrow… Why? § Most web applications have increasing traffic patterns § Traffic will increase, double, triple, 10x…sooner than you think § Don’t build it for today’s traffic build it for tomorrow’s traffic Confidential ©2008–16 New Relic, Inc. All rights reserved.
  24. 24. Build for tomorrow might mean: § Build in the ability to increasethe sizeand capacity of your databases. § Determinewhat logical limits exist toyour data scaling. What happens when your database tops out in its capabilities? § Build your applicationso that you can add additional applicationservers easily. This ofteninvolves being observant about where and how stateis maintained, and how traffic is routed.* § Think about caching. What informationcan be cached? What can't? Why can't it? § Redirect static traffictoofflineproviders. § Think about whether specificpieces of dynamic content can actually be generated statically. * This topic is large enough for an entire chapter, even an entire book, on on its own. Confidential ©2008–16 New Relic, Inc. All rights reserved.
  25. 25. Example: Is It Static or Dynamic? Confidential ©2008–16 New Relic, Inc. All rights reserved.
  26. 26. Example: Is It Static or Dynamic? Non-static content Confidential ©2008–16 New Relic, Inc. All rights reserved.
  27. 27. Example: Is It Static or Dynamic? Non-static content Banner is now static Confidential ©2008–16 New Relic, Inc. All rights reserved.
  28. 28. Example: Is It Static or Dynamic? Non-static content Banner is now static Personalized content can be added in browser Confidential ©2008–16 New Relic, Inc. All rights reserved.
  29. 29. Key 4Key 3Key 2Key 1 Always think about scaling OR Just because your application works now does not mean it will work tomorrow… Mitigate risk Key 5 Confidential ©2008–16 New Relic, Inc. All rights reserved. Build applications keeping availability inmind OR Develop for failure
  30. 30. All Systems Have Risk in Them Risk is a measure of the likelihood of a surprise occurring Server will crash Database will get corrupted Returned answer will be incorrect Network connection will fail Newly deployed piece of softwarewill fail There is risk that a … Confidential ©2008–16 New Relic, Inc. All rights reserved.
  31. 31. Risk § Keeping a system available requires removing risk… Hence, removing surprise § But as systems become more and more complicated… ... this becomes less and less possible Confidential ©2008–16 New Relic, Inc. All rights reserved.
  32. 32. Risk Managing what your risk is Managing how much risk is acceptable Knowing what you can do to mitigate the risk Risk Management is at the heart of building highly available systems Confidential ©2008–16 New Relic, Inc. All rights reserved.
  33. 33. Risk Knowing what you can do to mitigate the risk Confidential ©2008–16 New Relic, Inc. All rights reserved. Risk mitigation
  34. 34. Risk Mitigation Risk mitigation is part of risk management Risk mitigation: § Knowing what to do when a problem occurs in order to reduce the impact of the problem § Making sure your application works as best and as completely as possible, even when services and resources fail Confidential ©2008–16 New Relic, Inc. All rights reserved.
  35. 35. Risk Mitigation Risk mitigation requires thinking about the things that can go wrong … and putting a plan together, now… to be able to handle the situation when it does happen. Confidential ©2008–16 New Relic, Inc. All rights reserved.
  36. 36. Key 4Key 3Key 2Key 1 Always think about scaling OR Just because your application works now does not mean it will work tomorrow… Mitigate risk Monitor availability OR Yes, we can help you Key 5 Confidential ©2008–16 New Relic, Inc. All rights reserved. Build applications keeping availability inmind OR Develop for failure
  37. 37. Monitor Availability § Understand how yourapplicationis performing § Use application monitoring: § Keep an eye on how yourapp is performing § Generatenotificationswhenthe application performs in abnormalways § Make sure yourapp is properly instrumented § Internalas well as externalto yourapp Confidential ©2008–16 New Relic, Inc. All rights reserved.
  38. 38. Monitor Availability § Have yourtools monitorcontinuously § Establisha baseline forhow yourapplication is performing § Look for trends and patterns § Look for outliers and deviationsfrom the trends § Treat these as potentialavailability issues § As yoursystem grows: § Examinehow yourbaselinechanges § Make sure yourscalability plan will continueto work Confidential ©2008–16 New Relic, Inc. All rights reserved.
  39. 39. Service Level Agreements Establish Internal SLAs Quick diagnoses “Hot spots” to optimize performance Confidential ©2008–16 New Relic, Inc. All rights reserved.
  40. 40. Service Level Agreements Establish Internal SLAs Quick diagnoses “Hot spots” to optimize performance Critical to building scalable application Only way to scale an organization in a reliable way is with reliable SLAs Confidential ©2008–16 New Relic, Inc. All rights reserved.
  41. 41. Availability response OR Yes, that was your pager that went off Key 4Key 3Key 2Key 1 Always think about scaling OR Just because your application works now does not mean it will work tomorrow… Mitigate risk Monitor availability OR Yes, we can help you Key 5 Confidential ©2008–16 New Relic, Inc. All rights reserved. Build applications keeping availability inmind OR Develop for failure
  42. 42. Responsiveness When a problem occurs… § Do you know what to do to fix the problem? § Does everyone on your team know what to do? § Do you have playbooks? § Does your pager rotation and notification system work? Confidential ©2008–16 New Relic, Inc. All rights reserved.
  43. 43. Responsiveness You must be prepared to act on issues. This means: § Alerts that reach the needed individuals § Prepared processes and procedures for common failure modes (this is part of risk mitigation process) Confidential ©2008–16 New Relic, Inc. All rights reserved.
  44. 44. Responsiveness When an alert is triggered… § Owner of that service must be first ones alerted § Other teams may want to be alerted as well… § Services that are tightly dependent on triggered service § Early warning notification for upstream or downstream issues § May want a “second level” notification for dependencies Confidential ©2008–16 New Relic, Inc. All rights reserved.
  45. 45. Responsiveness BEFORE the problem occurs: § Well established plans § Documented processes and cheat sheets § Contact lists for critical consuming service owners § Clear,precise escalation plan: § Who to contactif problem becomestoo big for responderto handle § If scope ofproblem extendssignificantly and critically beyond failing system § Know who to escalate if first responderdoesn’t respond Confidential ©2008–16 New Relic, Inc. All rights reserved.
  46. 46. 5 Keys to High Availability WebApps Availability response Key 4Key 3Key 2Key 1 Build applications keeping availability inmind Always think about scaling Mitigate risk Monitor availability Key 5 Confidential ©2008–16 New Relic, Inc. All rights reserved.
  47. 47. Q Thank you for your time! Questions? Lee Atchison lee@newrelic.com www.leeatchison.com @leeatchison leeatchison Architecting for Scale Published by: O’Reilly Media Available: May 2016 www.architectingforscale.com Confidential ©2008–16 New Relic, Inc. All rights reserved.

×