Service Mesh vs. Frameworks: Where to put the resilience?

Service Mesh vs. Frameworks:
Where to put the resilience?
Michael Hofmann
https://hofmann-itconsulting.de

(1) Distributed Systems and Resilience
(2) Framework
(3) Service Mesh
(4) Framework and Service Mesh Characteristics
(5) Thoughts about Resilience
(6) Essential Requirements
(7) Conclusion
Agenda

Distributed Systems
➔ degree of distribution raises failure rate!
➔ compensation strategy: resilience!
slow response
timeout
aborted network connection
...
Typical Communication Errors Fallacies of Distributed Computing
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.

Hystrix
Alternative: Service Mesh?!
Resilience
Resilience4J
Failsafe
MicroProfile Fault Tolerance
…
Framework „DEATH“ Framework ACTIVE

Resilience Patterns
─
Timeout
─
Retry
─
Fallback
─
Circuit Breaker
─
Bulkhead
many more:
Uwe Friedrichsen: “Patterns of resilience” https://www.slideshare.net/ufried/patterns-of-resilience

@CircuitBreaker(successThreshold = 10,
requestVolumeThreshold = 4, failureRatio=0.5, delay = 1000)
public Connection serviceA() {
Connection conn = null;
counterForInvokingServiceA++;
conn = connectionService();
return conn;
}
MicroProfile Fault Tolerance
@Retry(maxRetries = 3)
@Fallback(fallbackMethod = "doFallback")
public Result doWork() {
return callServiceA(); // fallback on RuntimeException
}
private Result doFallback() {
return ...;
}

Service Mesh
The term service mesh is used to describe the
network of microservices that make up such
application and the interactions between them.
(istio.io)
Don’t manage a Service Mesh without tooling!
Requirements:
(1) manage calls on layer 7 (application layer, L7)
(2) resilience, routing, security and telemetry
(3) decentralized & transparent for services (implementation independent)

Resilience Patterns in Istio
✔
Timeout
✔
Retry
✔
CircuitBreaker
✔
Bulkhead
✗
Fallback?
✗
is a Fallback possible?
✗
less technical, more business driven
https://dzone.com/articles/fallbacks-are-overrated-architecting-for-resilienc

Resilience in Istio
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
retries:
attempts: 3
perTryTimeout: 2s
EOF
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 1
outlierDetection:
consecutiveErrors: 1
interval: 1s
baseEjectionTime: 3m
maxEjectionPercent: 100
EOF

Resilience in Istio
Apply to sidecar
Resilience rules
— transparent for service
— act global on all sidecars
Fault Injection
MicroProfile with Istio setting
apiVersion:networking.istio.io/
v1alpha3
kind: VirtualService
metadata:
name: ratings
...
spec:
hosts:
- ratings
http:
- fault:
delay:
fixedDelay: 7s
percent: 100
MP_Fault_Tolerance_NonFallback_Enabled = false

Frameworks Characteristics
—
Java: a lot of different frameworks
—
Team decides framework?!?
—
Learning curve for every framework
—
Different frameworks behave different
—
Same framework in different version behave different
—
Same framework in different versions parallel in use

Frameworks Characteristics
➔ Change of framework:
➔ Replace all positions in code
➔ New behavior
➔ New deployment
➔ New tests
➔ Risk of chain reaction:
framework ➔ load balancing ➔ service registry
➔ Multiple service registries for every different framework?

Service Mesh Characteristics
—
Define new rule
—
Same behavior (… no framework change)
—
unchanged deployed service
—
new tests only for new rules
—
Client-side load balancing in sidecar
—
Service Registry based on endpoints in K8S
$ kubectl apply -f ...

Thoughts about Resilience
Resilience pattern still correct if communication behavior changes?
—
Modified behavior of partner
—
Modified communication partner
—
Modified infrastructure
—
Load changes during day
—
Side effects from other systems
—
Anticipate problems of tomorrow?

Thoughts about Resilience
—
Main problem: choose the right resilience pattern
—
Correct parameters for pattern?
—
Measure resilience
—
Mostly: try & error for suitable pattern/params
(main reason for end of life in hystrix)
—
Often: retry storm
—
Often: missing musketeer principle
(black sheep)

Essential Requirements
—
Modification: Quick and easy change of
(1) params for chosen pattern
(2) resilience pattern
—
Test
—
Monitoring
—
No black sheep

Essential Requirements
Istio Framework
Modification
+ Modify Params
- Change Pattern: Lifecycle
Test Fault Injection complicated
Monitoring + +
Black Sheep
No:
rule in all sidecars
$ kubectl apply -f ...

Conclusion
—
Comparable resilience patterns
—
Missing fallback in service mesh (but overrated)
—
Higher flexibility in service mesh
—
Fault injection easy in service mesh
Solve problems where they arise!
Service Mesh for L4-L7
Developer for L8 (original profession)

Service Mesh vs. Frameworks: Where to put the resilience?

More Related Content

Similar to Service Mesh vs. Frameworks: Where to put the resilience?

More from Michael Hofmann

Recently uploaded

Service Mesh vs. Frameworks: Where to put the resilience?