Resilient microservices

Maxim Shelest
Architect at PAYBACK GmbH
#architect #developer #programmer #software-
craftsman #life-long-learner #leader #agile-
practitioner
Lormem ipsum
Microservices & resilience 2

https://source.superherostuff.com/wp-content/uploads/2015/09/IEVaULq.png
• Microservice(s) is in production or will be in production soon
• Clear separated business capabilities between microservices using DD principles
• Automated continuous deployment pipeline
• DEVOPS culture
• Monitoring and alerting with detailed dashboards
About to launch a project

Does “feature complete” mean
“production ready”?
?

Anything that can go wrong will go
wrong.
Murphy's law

Resilience
"Resilient systems provide and maintain an acceptable level
of service in face of faults (unintentional, intentional, or naturally caused)
affecting their normal operation"

Do not forget resilience during estimations.
Where resilience begins?
It is not only about
Microservices!

• recovery
• retry: very basic recovery mechanism
• self-healing: reinitialize components, either internally or by an external monitoring system (automatic service
restart, start service upon machine boot)
• retry budget: if the retry budget is exceeded, don’t retry; just fail the request to omit overloading (see circuit
breaker)
• exponential backoff with jitter: use randomized exponential backoff when scheduling retries
• bulkheads: partition your systems, so that you can keep a failure in one part of the system from destroying
everything
• complete parameter checking: protection from broken / malicious calls (Postel’s law: be liberal in what you
accept, and conservative in what you send)
• asynchronous communication (sender does not need to wait for receiver's response)
• event driven (event notification, event-carried state transfer, event-sourcing, CQRS)
• location transparency: sender does not need to know receiver’s concrete location
• zero downtime deployment: deployment is a bad reason for user facing unavailability
• stateless (service failover is hard with state)
Resiliency patterns (1)

• relaxed temporal constraints: use a more relaxed consistency model to reduce coupling. The real world is not
ACID, it is BASE.
• idempotence
• self contained deployment: services are self-contained deployment units (compatibility between dependent
components/APIs)
• timeouts
• circuit breakers: prevent a failure from constantly recurring
• failover
• fallback/graceful degradation: The ability of maintaining functionality when portions of a system break down
• error handler: separate business logic and error handling
• fail fast: add checks in front of expensive operations, avoid foreseeable failures, also called Handshaking)
• fan out & quickest reply: send request to multiple workers - use quickest reply and discard all other responses
• bounded queues: avoids latency due to overloaded resources (use thread pools and
connection pools)

• shed load: shed requests based on resource load
• observability: goal is to automatically act on detected failures
• monitoring:
• health checks,
• synthetic transactions,
• monitor metrics
• alerting/visualization
• distributed systems tracing
• log aggregation/analytics
Always think about
composition of patterns
to achieve resilience

Synchronous vs Asynchronous
Do not use synchronous communication! Asynchronous is better!
The event-based or event-driven approach is a de facto worldwide standard for
implementation of asynchronous communication in microservices architectures.

Synchronous vs Asynchronous
Do not use synchronous communication! Asynchronous is better!
The event-based or event-driven approach is a de facto worldwide standard for
implementation of asynchronous communication in microservices architectures.
Order
Service
Shipping
Service
shipNewOrder
produce
Order
message broker
consume
Order

"But we need to query data from
other Microservice?"
?

Command Query Responsibility Segregation
Order
Service
Customer
Service
produce
Order
message broker
Customer
View
Service
produce
Customer
Customers
Orders
findHighValueCustomers
Customers
Orders

16
"But we still need to connect to other
systems synchronously, for example
using HTTP(S)!"

Important resilience patterns implementing HTTP client
Timeouts
Retry
Retry budget

Timeouts
Retry
Retry budget
Exponential
backoff
Circuit
breaker

Timeouts
Retry
Retry budget
Exponential
backoff
Circuit
breaker
Bulkheads
Shed load

Timeouts
Retry
Retry budget
Exponential
backoff
Circuit
breaker
Bulkheads
Shed load Fail fast
Fan out &
quickest reply
Caching

Timeouts
Retry
Retry budget
Exponential
backoff
Circuit
breaker
Bulkheads
Shed load Fail fast
Fan out &
quickest reply
Caching
Observability
Monitoring
Alerting
Tracing
Log aggregation

Build for production and not for test
environment.
22
Remember TANSTAAFL:
there ain’t no such thing as a free lunch.

Do not over engineer your
system!
"Debugging is twice as hard as
writing the code in the first place.
Therefore, if you write the code as
cleverly as possible, you are, by
definition, not smart enough to
debug it."
Brian W. Kernighan
Lormem ipsum

Learn from production problems!
• “The cost of failure is education (Devin Carraway)”.
• Use blameless postmortems
(https://landing.google.com/sre/book/chapters/postmortem-culture.html)
• Human error is never a root cause!
• Resilience should be a 4th management objective,
alongside Better/Faster/Cheaper.

Resilient microservices

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Resilient microservices

Similar to Resilient microservices (20)

Recently uploaded

Recently uploaded (20)

Resilient microservices