3. https://source.superherostuff.com/wp-content/uploads/2015/09/IEVaULq.png
• Microservice(s) is in production or will be in production soon
• Clear separated business capabilities between microservices using DD principles
• Automated continuous deployment pipeline
• DEVOPS culture
• Monitoring and alerting with detailed dashboards
About to launch a project
Microservices & resilience 3
5. Anything that can go wrong will go
wrong.
Murphy's law
Microservices & resilience 5
6. Resilience
"Resilient systems provide and maintain an acceptable level
of service in face of faults (unintentional, intentional, or naturally caused)
affecting their normal operation"
Microservices & resilience 6
8. Do not forget resilience during estimations.
Where resilience begins?
Microservices & resilience 8
It is not only about
Microservices!
9. • recovery
• retry: very basic recovery mechanism
• self-healing: reinitialize components, either internally or by an external monitoring system (automatic service
restart, start service upon machine boot)
• retry budget: if the retry budget is exceeded, don’t retry; just fail the request to omit overloading (see circuit
breaker)
• exponential backoff with jitter: use randomized exponential backoff when scheduling retries
• bulkheads: partition your systems, so that you can keep a failure in one part of the system from destroying
everything
• complete parameter checking: protection from broken / malicious calls (Postel’s law: be liberal in what you
accept, and conservative in what you send)
• asynchronous communication (sender does not need to wait for receiver's response)
• event driven (event notification, event-carried state transfer, event-sourcing, CQRS)
• location transparency: sender does not need to know receiver’s concrete location
• zero downtime deployment: deployment is a bad reason for user facing unavailability
• stateless (service failover is hard with state)
Resiliency patterns (1)
Microservices & resilience 9
10. • relaxed temporal constraints: use a more relaxed consistency model to reduce coupling. The real world is not
ACID, it is BASE.
• idempotence
• self contained deployment: services are self-contained deployment units (compatibility between dependent
components/APIs)
• timeouts
• circuit breakers: prevent a failure from constantly recurring
• failover
• fallback/graceful degradation: The ability of maintaining functionality when portions of a system break down
• error handler: separate business logic and error handling
• fail fast: add checks in front of expensive operations, avoid foreseeable failures, also called Handshaking)
• fan out & quickest reply: send request to multiple workers - use quickest reply and discard all other responses
• bounded queues: avoids latency due to overloaded resources (use thread pools and
connection pools)
Resiliency patterns (2)
Microservices & resilience 10
11. • shed load: shed requests based on resource load
• observability: goal is to automatically act on detected failures
• monitoring:
• health checks,
• synthetic transactions,
• monitor metrics
• alerting/visualization
• distributed systems tracing
• log aggregation/analytics
Resiliency patterns (3)
Microservices & resilience 11
Always think about
composition of patterns
to achieve resilience
12. Synchronous vs Asynchronous
Microservices & resilience 12
Do not use synchronous communication! Asynchronous is better!
The event-based or event-driven approach is a de facto worldwide standard for
implementation of asynchronous communication in microservices architectures.
13. Synchronous vs Asynchronous
Microservices & resilience 13
Do not use synchronous communication! Asynchronous is better!
The event-based or event-driven approach is a de facto worldwide standard for
implementation of asynchronous communication in microservices architectures.
Order
Service
Shipping
Service
shipNewOrder
produce
Order
message broker
consume
Order
14. "But we need to query data from
other Microservice?"
?
Microservices & resilience 14
15. Command Query Responsibility Segregation
Microservices & resilience 15
Order
Service
Customer
Service
produce
Order
message broker
Customer
View
Service
produce
Customer
Customers
Orders
findHighValueCustomers
Customers
Orders
16. 16
"But we still need to connect to other
systems synchronously, for example
using HTTP(S)!"
20. Important resilience patterns implementing HTTP client
Microservices & resilience 20
Timeouts
Retry
Retry budget
Exponential
backoff
Circuit
breaker
Bulkheads
Shed load Fail fast
Fan out &
quickest reply
Caching
21. Important resilience patterns implementing HTTP client
Microservices & resilience 21
Timeouts
Retry
Retry budget
Exponential
backoff
Circuit
breaker
Bulkheads
Shed load Fail fast
Fan out &
quickest reply
Caching
Observability
Monitoring
Alerting
Tracing
Log aggregation
22. Build for production and not for test
environment.
22
Remember TANSTAAFL:
there ain’t no such thing as a free lunch.
23. Do not over engineer your
system!
"Debugging is twice as hard as
writing the code in the first place.
Therefore, if you write the code as
cleverly as possible, you are, by
definition, not smart enough to
debug it."
Brian W. Kernighan
Lormem ipsum
Microservices & resilience 23
24. Learn from production problems!
• “The cost of failure is education (Devin Carraway)”.
• Use blameless postmortems
(https://landing.google.com/sre/book/chapters/postmortem-culture.html)
• Human error is never a root cause!
• Resilience should be a 4th management objective,
alongside Better/Faster/Cheaper.
Microservices & resilience 24