An exception occurred…
Please try again
LAILA BOUGRIA
@noctovis
@lailabougria
Exceptions happen all the time
 Systemic exceptions
 Intentional exceptions
 Transient exceptions
All exceptions are not equal
 Well-known
 Easy to reproduce
 Identify – Fix – Test
 Techniques: TDD, Integration testing, Acceptance testing
Systemic exceptions
 Intentional use of exceptions
 E.g. ValidationException
 Acceptable outcome
 Expect the calling API to handle this
Intentional exceptions
 Happen unexpectedly
 Hard to reproduce
 External systems
 Infrastructure
 Some persist longer than others
Transient exceptions
 Concurrency exceptions
 Failover of a database cluster
 System overload
 External services
Transient exceptions
 Latency
 Hardware recycling
 Throttling
 Network connectivity
Transient exceptions in cloud computing
 Hard to identify
 Hard to reproduce
 Hard to test
Transient exceptions
An example
Place
order
Success
Failure
Call
support
Move to
competitor
 User
 Support engineer
 Software engineer
What more can we do?
 Cope with failures
 Without data loss
 Without affecting users
Build resilient systems
 Retries
 Retries with exponential back-off
 Circuit breaker
 Fallback
Resilience strategies
Retry pattern
Retries
 Wrap code in try-catch block
 Exception?
 Repeat!
 Configured amount of times
 Perfect for concurrency issues, flaky connections
Immediate retry
Retry with exponential back-off
Retry with exponential back-off and Jitter
Combined immediate and backoff
Polly
 Transient-fault-handling library
 Policy-based
 Supports reactive and proactive resilience patterns
 Retry & WaitAndRetry
An example with Polly
An example with Polly
Polly
 Different policies for different needs
 Use the policy where you need retries
 Plug Polly into ASP.NET Core Middleware
 Microsoft.Extensions.Http.Polly package
Design considerations
 Nested retries
 Isolate change
 Idempotency
Nested retries
 Avoid nested layers of retries
 Exponential number of requests
 Initiating request timeout
 Consider how APIs are implementing retries
 Action needs to be a single unit (of work)
 Retry attempts should be independent
 No shared state
 No left-over state
Isolate change
Idempotency
 Repeating an action produces equal results
 Duplicate requests will occur
 Multi-infrastructure
 No transactions
Remember this?
Place
order
Success
Failure
Call
support
Move to
competitor
An example with retries
Place
order
Success
Failure Immediate
retries
Success
Failure Delayed
retries
Success
Failure
Call
support
Move to
competitor
All retries failed…
We’re having a really bad day
At that point…
 Request is lost
 User needs to start over
 They might just give up
Adopting a message-based architecture
 Move away from request-response
 Capture request
 Inform the user the request will be handled
 Handle request in dedicated process
Handling API
Coupled system
Calling API
Handling API
Message-based system
Producer
Queue
Successful processing
Consumer
Queue
Ack
Queue
Failed processing
Consumer
Nack
 Retries don’t affect synchronous path
 Fire-forget nature
 Retry forever
 Business retries
Benefits from a recoverability perspective
An example with messaging
Place
order
Send
message
Message in
queue
Process
message
Success
Failure Message
nacked
Message
acked
Recoverability with queues
 Only immediate retries
 Increases load on the system
 Retries with exponential backoff
 Persistent failures to error queue
 Messaging middleware technology
 High scalability & flexibility
 Support for multiple queuing technologies and data stores
 Supports Outbox and Saga patterns
 Monitoring and debugging tools
NServiceBus
 Immediate & delayed retries & custom policies
 Centralized error queue
 Error notifications
 Unrecoverable exceptions
 Automatic rate limiting
 Circuit breaker support
Recoverability in NServiceBus
NServiceBus recoverability configuration
An example with NServiceBus
Place
order
Send
message
Message in
queue
Process
message
Failure
Success
Immediate
retries
Delayed
retries
Error
queue
Failure
Success
Failure
Success
Where do retries apply?
Polly outside message handlers
Remember idempotency?
 Same message might arrive more than once
 Message deduplication with Outbox
 Messages are guaranteed to be processed exactly once
 Consistency between message and data operations
Atomicity problem: zombie record
Atomicity problem: ghost message
Outbox
No Outbox guarantees here
Transactional session
Don’t ever catch exceptions again!
Let’s recap
 Multiple strategies to increase resilience
 Don’t force retries on your users, do it for them!
 Don’t write your own retries
 Embrace asynchronous messaging
@noctovis @lailabougria @lailabougria

An exception occurred - Please try again