Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applying SRE techniques to micro service design

1,088 views

Published on

6 hard-won lessons from the world of SRE applied to micro service software design. All lesson apply to all software design.

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Applying SRE techniques to micro service design

  1. 1. can be applied to the nascent world of microservices. Put some SRE in your microservices Hard-won lessons from the world of SRE…
  2. 2. The many faces of Theo Schlossnagle @postwait CEO Circonus
  3. 3. The nature of the problem Software Sucks Once you’ve run software at scale, you have a deep understanding of how it is all tied together with loose string and hope.
  4. 4. All software will fail, but good software fails well • Consider the phrase: “have you used X in anger.”
  5. 5. Never undervalue grace in failure. Rule . 𝛌1 Crash landings should be both fast and controlled.
  6. 6. What it means to fail quickly & safely • The scope of failure should collapse completely. • The time to failure should be measured in small multiples of normal service time • Nothing outside the scope of failure should be impacted. https://www.youtube.com/watch?v=5SL1A2d2e7M
  7. 7. Autopsies: not just for medicine. Rule . 𝛌2 Post-mortems are fundamental.
  8. 8. Pragmatic analysis is required to understand failure’s true nature • Post-mortem analysis is critical • Stack traces • Forensic logs • Images (cores, dumps, etc.)
  9. 9. The difference between a shock and electrocution is real. Rule . 𝛌3 Use circuit breakers.
  10. 10. Circuit breakers are designed to avoid cascading failure • it’s not all about, especially with microservices • protect yourselves and others • circuit breakers of many type • timing • queue depth • concurrency http://melissaomarkham.com
  11. 11. You cannot understand what you cannot measure. Rule . 𝛌4 Behavior is complex. Understand it.
  12. 12. Don’t measure to assess availability measure to understand Build robust models of behavior Understand performance changes Don’t use averages Don’t use percentiles alone
  13. 13. Don’t measure to assess availability measure to understand Build robust models of behavior Understand performance changes Don’t use averages Don’t use percentiles alone
  14. 14. It’s easy to demand perfection; it’s also stupid. Rule . 𝛌5 Have an failure budget.
  15. 15. Avoid failure is simply impossible, expect and manage failure • use failure budgets • set expectations reasonably • define and reward successes on improvement and competency, not just uptime.
  16. 16. Justice should be blind; operations should not. Rule . 𝛌6 Instrumentation & Observability have no equals.
  17. 17. For every “I wonder what X is right now?” in production, you must have answers DTrace eBPF Instrument code for observability https://www.pinterest.com/pin/441775044670412234/
  18. 18. Thank you.

×