Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How microservices fail, and what to do about it

1,489 views

Published on

4 fundamental microservice message flows, and how they fail, and how you can mitigate these failure modes.

Published in: Software
  • Be the first to comment

How microservices fail, and what to do about it

  1. 1. How Microservices Fail… and what to do about it. Richard Rodger @rjrodger
  2. 2. github.com/rjrodger/nodezoo https://aws.amazon.com/message/5467D2/
  3. 3. github.com/rjrodger/nodezoo
  4. 4. github.com/rjrodger/nodezoo
  5. 5. github.com/rjrodger/nodezoo Pattern Matching Service discovery is an anti-pattern. Instead, make messages first-class citizens. Use message data to define patterns, and these patterns define a language! Transport Independence Services should not know about each other, or how to send messages. Services are fully defined by: message patterns that they recognise, and message patterns that they emit.
  6. 6. github.com/rjrodger/nodezoo // a search message { "role": "search", // a namespace "cmd": "search", // this is a command "query": "ldap", // some data } ! // the pattern to match role:search,cmd:search
  7. 7. github.com/rjrodger/nodezoo // some nodezoo message patterns ! role:search,cmd:search // do a search role:search,cmd:insert // insert into index role:info,cmd:get // get module info role:npm,cmd:get // get npm data role:npm,info:change // module changed! role:info,req:part // need module info role:info,res:part // here's module info ! ! !
  8. 8. github.com/rjrodger/nodezoo role:search,cmd:search role:info,cmd:get synchronous request/response
  9. 9. github.com/rjrodger/nodezoo role:npm,info:change asynchronous "winner-take-all" (actor)
  10. 10. github.com/rjrodger/nodezoo role:info,req:part asynchronous "fire-and-forget" (publish/subscribe) role:info,res:part
  11. 11. github.com/rjrodger/nodezoo asynchronous "fire-and-forget" (publish/subscribe) synchronous request/response asynchronous "winner-take-all" (actor) synchronous "sidewinder" (side effects!) synchronous/ asynchronous consumed/ observed
  12. 12. github.com/rjrodger/nodezoo senecajs.org
  13. 13. github.com/rjrodger/nodezoo code (branch: msdub201509)
  14. 14. github.com/rjrodger/nodezoo kintsugi ⾦金継ぎ (golden joinery)
  15. 15. github.com/rjrodger/nodezoo asynchronous "fire-and-forget" (publish/subscribe) synchronous request/response asynchronous "winner-take-all" (actor) synchronous "sidewinder" (side effects!) How do these break?
  16. 16. github.com/rjrodger/nodezoo asynchronous "fire-and-forget" (publish/subscribe) synchronous request/response asynchronous "winner-take-all" (actor) synchronous "sidewinder" (side effects!)
  17. 17. github.com/rjrodger/nodezoo failure mode "Slow downstream" B responses are getting slower, consuming As resources. mitigation Drop B. A should consider B dead. Or: the transport should handle this.
  18. 18. github.com/rjrodger/nodezoo failure mode "Upstream overload" A is sending messages to B at a higher rate than B can handle. mitigation Back-pressure from B. A should accept back pressure notifications and scale back. Assumes B is doing more work than A.
  19. 19. github.com/rjrodger/nodezoo asynchronous "fire-and-forget" (publish/subscribe) synchronous request/response asynchronous "winner-take-all" (actor) synchronous "sidewinder" (side effects!)
  20. 20. github.com/rjrodger/nodezoo failure mode "Lost Actions" A is sending messages to B, and C is listening. But perhaps the latest version of C is broken? mitigation Measure message flow rates. Do the flow ratios match the business rules?
  21. 21. github.com/rjrodger/nodezoo failure mode "Broken Contracts" A and B are using a newer message schema, but you forgot about C. mitigation Don't use contracts! Message schemas are a net negative and hinder multi-version deployment.
  22. 22. github.com/rjrodger/nodezoo asynchronous "fire-and-forget" (publish/subscribe) synchronous request/response asynchronous "winner-take-all" (actor) synchronous "sidewinder" (side effects!)
  23. 23. github.com/rjrodger/nodezoo failure mode "Poison Message" A is sending messages that crash B. B keeps restarting and trying to handle the poison message. Now nothing works! mitigation B should drop out of date messages on the floor. B should maintain a list of recently seen messages and ignore duplicates. B sends bad messages to the "Dead Letter" log.
  24. 24. github.com/rjrodger/nodezoo failure mode "Guaranteed Delivery ... ain't" B expects at-most-once, exactly-one, or at-least-once delivery of unique messages. This is not possible. mitigation Idempotency. Where possible, duplicate messages should have no bad effects.
  25. 25. github.com/rjrodger/nodezoo asynchronous "fire-and-forget" (publish/subscribe) synchronous request/response asynchronous "winner-take-all" (actor) synchronous "sidewinder" (side effects!)
  26. 26. github.com/rjrodger/nodezoo failure mode "Emergent Behaviour" Strange loops and unexplained message paths make it hard to understand what the system is doing. mitigation Correlate messages. Attach correlation identifiers to messages so that you can trace the flow of causality.
  27. 27. github.com/rjrodger/nodezoo failure mode "Catastrophic Collapse" You've introduced feedback that grows exponentially. And you've no idea how to fix it. mitigation Have a kill switch. Microservices aren't a silver bullet. Sometimes you need to selectively reboot.
  28. 28. github.com/rjrodger/nodezoo P( success ) = 1 P( failure ) < ε vs.
  29. 29. github.com/rjrodger/nodezoo // apoptosis setTimeout(function(){ process.exit(0) }, 60*60*1000*Math.random()) ! ! ! ! ! ! ! ! !
  30. 30. Thanks! Richard Rodger @rjrodger

×