Service resiliency in microservices

Service Resiliency in Microservices
Afkham Azeez, Vice President, WSO2
November 2018

“
Random House Kernerman Webster’s College dictionary
Ability to return to the original form or position after being
affected by a particular alteration.
Ability to recover from illness, depression, adversity or the like.
Resilience

Because real systems need to run in production!

Availability is critical in production
MTTF
MTTF + MTTR
A =
E[Uptime]
E[Uptime] + E[Downtime]
A =

Traditional approach to improving availability
MTTF
MTTF + MTTR
A =

Resilience approach to improving availability
MTTF
MTTF + MTTR
A =

Failures are a fact of life.
Don’t avoid failures. Embrace them.

Resilience
○ The ability of an app to recover from certain types of failure yet remain
functional from the users’ perspective
○ Users don’t notice it
○ Graceful degradation
○ Resilience is how you achieve the outcome
○ Also called recoverability

Features of resilience systems
○ Failures are compartmentalized
○ A resilient system would automatically cut off failing components and
reintegrate them once they are no longer failing

MSA and resilience
MSA inherently helps with designing resilient systems

Techniques
○ Timeout
○ Retry
○ Failover
○ Load balancing
○ Circuit breaker
○ Transactions

Don’t hide the network
○ Calls over the network should always return errors in addition to the
response
○ Network calls should be easily distinguishable

var backendRes = backendClientEP->forward("/hello", request);
match backendRes {
http:Response res => {
...
}
error responseError => {
log:printError("Error sending response", err = responseError);
}
}

Timeout
○ Hide downstream latency and keep the responsiveness to upstream
○ Prevent waiting forever

Timeout Sample
endpoint http:Client backendClientEP {
url: "http://localhost:8080",
timeoutMillis: 2000
};
var backendRes = backendClientEP->forward("/hello", request);
match backendRes {
http:Response res => {
...
}
error responseError => {
string resp = cache.get(“hello”);
...
}
}

Retry
○ Transient failures are not uncommon
○ In such cases, a simple retry is sufficient
○ Can be handled by
○ Retry immediately
○ Retry with a delay
○ IMPORTANT: Idempotency should be handled at the application layer

Retry Sample
retryConfig: {
interval: 3000,
count: 3,
backOffFactor: 2
},
timeoutMillis: 2000
};
Retry after 3000, 6000 & 12000 ms

Failover
Switch to an alternate endpoint when the primary fails

Failover Sample
endpoint http:FailoverClient foBackendEP {
timeoutMillis: 5000,
failoverCodes: [501, 502, 503],
intervalMillis: 5000,
targets: [
{ url: "http://localhost:3000/mock1" },
{ url: "http://localhost:8080/echo" },
{ url: "http://localhost:8080/mock" }
]
};

Load Balancing
Load balance across endpoints

Load Balancing Sample
endpoint http:LoadBalanceClient lbBackendEP {
targets: [
{ url: "http://localhost:8080/mock3" }
],
algorithm: http:ROUND_ROBIN,
timeoutMillis: 5000
};

Circuit breaker
○ Some transient failures take much longer to recover
○ Repeatedly retrying may hinder recoverability
○ Retry up to a certain degree and cut off

Circuit Breaker
OPEN HALF OPEN
CLOSED
Success
Failure
Delay
Failure
threshold
exceeded

Circuit Breaker Sample
circuitBreaker: {
rollingWindow: {
timeWindowMillis: 16000,
bucketSizeMillis: 2000
},
failureThreshold: 0.2,
resetTimeMillis: 10000,
statusCodes: [400, 404, 500]
},
timeoutMillis: 2000
};

Transactions
µ3
µ1 µ6
µ2 µ4
µ5
Initiator
Participants
S1
S2
S3

2PC Coordination
Completion protocol
alt
Initiator Coordinator
Create-Context()
Micro-Transaction-Context
Commit()
Committed | Aborted | Mixed
Abort()
Aborted | Mixed

2PC Coordination
Durable protocol - triggered by commit
success
Coordinator Participant
Prepare()
Prepared | Read-Only | Aborted | Committed
notify(...Commit...)
Committed
notify(...Abort...)
Aborted

2PC Coordination
Durable protocol - triggered by abort
Coordinator Participant
notify(...Abort...)
Aborted

Transactions Sample - initiator
transaction with retries = 2 {
// Calling a local participant
localParticipantDoSomething();
// Calling a remote participant
http:Response res = check participant->get(“/ParticipantService/hi”)
} onretry {
// Code here will execute before retrying
} committed {
// Code here will execute after the initiated transaction
// has committed
} aborted {
// Code here will execute after the initiated transaction
// has been aborted
}

Transactions Sample - participant
service<http:Service> ParticipantService bind listener {
@transactions:participant {
oncommit = onTxnCommit,
onabort = onTxnAbort
}
hi(endpoint caller, http:Request req) {
...
}
}
...
function onTxnCommit(string transactionId) {
...
}
function onTxnAbort(string transactionId) {
...
}

https://github.com/afkham/
ballerinacon2018_resiliency

Service resiliency in microservices

Recommended

Recommended

More Related Content

Similar to Service resiliency in microservices

Similar to Service resiliency in microservices (20)

More from Ballerina

More from Ballerina (20)

Recently uploaded

Recently uploaded (20)

Service resiliency in microservices