An Introduction to ISTIO
Andrea Monacchi
1. K8s refresher
2. Introduction to Istio
3. Telemetry
4. Traffic Management
5. Security
K8s refresher
K8s Architecture
● Master (Supervisor)
○ api-server - exposes K8s APIs
○ etcd - distributed key-value storage
○ scheduler - selects a node to run a new pod
○ controller-manager - detects changes on nodes,
number of replicas, endpoint availability, changes to
service accounts, etc.
○ kube-dns - DNS service mapping services to pods
● Worker Nodes (hosting worload pods)
○ kubelet - runs containers based on Pod specs
○ kube-proxy - implements network rules
● Simple Setup - KubeAdm
○ KubeAdm Init (Master) / KubeAdm Join (each Worker)
Basic Concepts
● Pod
○ group of colocated containers - shared IP address and file system volumes
○ most commonly - 1 container : 1 pod, sidecar container pattern
● Pod Controllers - define a template to run multiple pods in a group
○ Deployment - to define multiple replicas (ReplicaSet)
○ DaemonSet - runs a copy of a pod on each node available to the cluster (i.e. daemon) - for monitoring, storage, etc.
○ StatefulSet - pods with persistent IDs to directly match resources (e.g. volumes)
● Service discovery and Load Balancing
○ Service - maps a DNS name to a set of Pods - can load balance them
● Storage
○ PersistentVolume (PV) - volume statically available to cluster or dynamically provisioned using a storage class
○ PersistentVolumeClaim (PVC) - request for storage resolved with a PV
○ StorageClass - describe storage type as provisioner, parameters.type, reclaimPolicy
Introduction to Istio
● Open-source service mesh
● Unifies ways of securing, connecting and monitoring microservices! (reference)
○ automatic load balancing for HTTP, gRPC, WebSocker, TCP traffic
○ fine-grained control of traffic (routing rules and policies, retries, failovers, fault injection
○ policy layer supporting access control, rate limits and quotas
○ out-of the box metrics, logs, traces for all traffic across the cluster, including ingress and egress
○ secure service-to-service communication with strong identify-based authentication and authorization
● Collection of packages to be run on a K8s Cluster
Istio Architecture
1. Data Plane
a. Envoy Proxy injected as sidecar container
b. Traffic Routing + Telemetry
2. Control Plane
a. controls data plane configuration
b. comprises:
i. Pilot - acts on the configs. of Envoy Sidecars
ii. Citadel - Identity & Access Management (IAM)
iii. Galley - Overall Configuration Management
c. configuration propagation
i. input yaml detected by galley
ii. configuration converted to istio format
iii. istio format passed to pilot
iv. pilot convert it to envoy configuration
d. citadel - manages TLS/SSL certificates
K8s Operators
● Software extensions to K8s meant for automation - original blog post
○ operator = custom resource definition (CRD) + controller
○ applicable if custom resources can be exposed via a CRUD (create, read, update, delete) REST API
■ resources no longer treated as collection of primitives (e.g. pods, deployment, services) but as a single
object exposing only what shall be controlled from outside
■ object integrated with K8s api (http+kubectl)
○ control loop monitoring changes on monitored resources
■ runs beside the control plane (running default controllers), e.g. as any deployment
■ very application specific - translates to primitive resources (e.g. pods)
● Implementation
○ As a client querying the kube api
○ using an SDK (e,g. KUDO, KubeBuilder, Metacontroller, Operator Framework)
○ e.g. OperatorFramework - GoLang, Ansible, Helm - implementation possibilities
Data Plane: The Envoy Proxy
● automatically injected as sidecar container for each pod - helper container design pattern
● Envoy proxy
○ Reverse Proxy in C++ from Lyft
○ Istio translates Yaml-based resource definitions into Envoy configurations automatically
Istio Setup
● Prerequisite - Running K8s cluster
○ Minikube/microK8s for dev
○ Rancher/Kubespray/KOPS for cloud-based
● Install istioctl CLI to
● Install istio (preferred as operator)
○ install using istioctl
○ install using Helm Chart
Istio Setup
● Prerequisite - Running K8s cluster
○ Minikube/microK8s for dev
○ Rancher/Kubespray/KOPS for cloud-based
● Install istioctl CLI to
● Install istio (preferred as operator)
○ install using istioctl
○ install using Helm Chart
Tutorial On Ubuntu:
sudo snap install microk8s --classic
microk8s enable dns registry istio
microk8s kubectl get all --all-namespaces
Shortcut to kubectl for microk8s:
kubectl --kubeconfig file
alias mkctl="microk8s kubectl"
mkctl get pods
Istio Setup
((base) pilillo@pilillo-ThinkPad-P70:~/Documents$ mkctl get pods -n istio-system
istio-tracing-579d7647d9-hxxrp 1/1 Running 0 14m
grafana-d7994566f-wfjtl 1/1 Running 0 14m
kiali-77f97f5b4d-nmgjk 1/1 Running 0 14m
istio-citadel-67658cf6c-hdcjt 1/1 Running 0 14m
istio-sidecar-injector-589988b5d6-stmnb 1/1 Running 0 14m
istio-galley-567478fb94-lkqnf 1/1 Running 0 14m
istio-grafana-post-install-1.5.1-v8zg9 0/1 Completed 0 14m
istio-security-post-install-1.5.1-bpfbr 0/1 Completed 0 14m
prometheus-9d65f7646-qjcqk 1/1 Running 0 14m
istio-pilot-64d96677f8-l5gj7 2/2 Running 3 14m
istio-ingressgateway-56bb766b96-fph97 1/1 Running 0 14m
istio-egressgateway-756f9bc5b9-5jp2l 1/1 Running 0 14m
istio-telemetry-64d8c69d67-fx5m2 2/2 Running 6 14m
istio-policy-58d8b97644-8b4hc 2/2 Running 6 14m
● Kiali (mkctl port-forward -n istio-system svc/kiali 20001:20001or directly istioctl dashboard)
○ Istio Dashboard - entrypoint to all features we will be discussing
○ shows interconnection of services as a graph (mesh) - Example
● Jaeger tracing (microk8s.istioctl dashboard jaeger)
○ displays service interaction in terms of traces - Example
● Prometheus + Grafana metrics
○ microk8s.istioctl dashboard prometheus
■ check “Status” > “Targets” to see scraped HTTP endpoints
○ microk8s.istioctl dashboard grafana
■ click “dashboard” > “manage” > “istio” > “Istio name Dashboard”
Traffic Management
Traffic Management
● New entities introduced by Istio:
○ Virtual Service
■ a set of custom traffic routing rules to apply to when a K8s service (host) is addressed on a specific protocol
■ the specification is per protocol (e.g. http, tcp) and can match a subset (i.e. service version)
■ the mapping subset/label (e.g. subset-version) for the service is defined as destination rule
○ Destination Rule
■ defines routing policies for a load balancer (somehow but not directly related to virtual service)
■ configures the load balancer, including settings for outlier detection to evict unhealthy hosts from the pool
○ Gateway
■ defines a load balancer placed at the edge of the mesh (i.e. as ingress/egress)
■ allows any virtual service in the same or a matching namespace (based on expression) to bind to it
Canary Releases
● Deploy new version alongside old version
○ Define availability for a percentage of requests - to test new version (like a pilot)
○ Useful for very busy services whose offtime is not an option - Reduce risk of deploying possibly faulty code
● No direct solution in K8s
○ Would mean a Service mapping to both the new and old version running on different pods
■ e.g. deployment has service-name and service has service-name
■ any label can be used to group pods on a service, but kiali assigns to “app” a special meaning
■ a version label can be used along the app one, do distinguish the version used within the deployment
○ Default - (probabilistic) round robin on pods - to implement a percentage would need a proportional number of pods
● Kial UIi: “Actions” > “Create Weighted Routing”
○ to create a virtual service and a destination rule to the pod groups (based on their version label)
● Yaml definition for a VirtualService
kind: VirtualService
name: just-a-name-for-your-virtual-service
namespace: default
- your-service-name
gateways: ~
- route:
- destination:
host: your-target-service-name
subset: v1-group
weight: 10
- destination:
host: your-target-service-name
subset: v2-group
weight: 90
tcp: ~
tls: ~
exportTo: ~
can be a subset on the same service host or even different services
just a name for a routing configuration!
● A Service maps a DNS name to a set of Pods (IP addrs)
● A VirtualService defines a set of routing rules (what/when to call)
● istio-pilot applies the VS spec as envoy configuration on istio each sidecar proxy
the service we apply the rules to - <svcname>.<nsname>.svc.cluster.local
we are intercepting traffic of the host svc and redirecting it to different ones
kind: DestinationRule
name: just-a-name-for-a-destination-rule
namespace: default
host: service-name
trafficPolicy: ~
- labels:
version: v2
name: v1-group
- labels:
version: v2
name: v2-group
● directly related to a virtual service
● can be used to affect the load balancing for the original service
define which pods should be part of each subset for the VS:
● service name matches in Service service-name
● labels.version lookups pods with same label value for key “version”
● the subset name is the one to be used in the VS
Load Balancing
● round robin
● hashing-based (session affinity - same user to same svc/pod)
○ only for HTTP connections - hash to route traffic
○ uses hash of either HTTP header, cookie or source IP
○ not working with weighted routing (routing comes before hashing)
● locality-based - based on traffic origin
○ distribute policy - weight on zones/locations
○ failover policy - failover when endpoint becomes unhealthy
kind: DestinationRule
name: just-a-name-for-a-destination-rule
namespace: default
host: service-name
- labels:
version: v2
name: v1-group
- labels:
version: v2
name: v2-group
name: user
ttl: 0s
- from: us-west/zone1/*
"us-west/zone1/*": 80
"us-west/zone2/*": 20
- from: us-west/zone2/*
"us-west/zone1/*": 20
"us-west/zone2/*": 80
Ingress Gateways
● mesh edge - alternative to classic K8s
Ingress Controllers
● istio-ingressgateway pod and service
ramped up during istio installation
● adds monitoring and usual istio
functionalities for traffic routing
● configured as any other istio service rather
than tech specific ingress controller
● by default deny all traffic
kind: Gateway
name: ingress-gateway-configuration
istio: ingressgateway
- port:
number: 80
name: http
protocol: http
- “”
selects default istio ingress
gateway who has the label set
list of DNS names we are
listening for (or * if we are just
testing locally), this has to be
reflected on the virtual service
Exposed Virtual Service
kind: VirtualService
name: just-a-name-for-your-virtual-service
namespace: default
- “”
- ingress-gateway-configuration
- route:
- destination:
host: your-target-service-name
subset: v1-group
weight: 10
- destination:
host: your-target-service-name
subset: v2-group
weight: 90
tcp: ~
tls: ~
exportTo: ~
a DNS name or star to catch all (testing only!)
- name: “whatever-name”
- uri:
prefix: “/something”
- uri:
prefix: “/else”
uri: “/newuri”
- destination:
host: your-target-service-name
subset: v2-group
weight: 90
value: 10.0
fixedDelay: 10s
HTTP rewrite
Exact-, prefix-, regex-based
routing on incoing requests
e.g., useful for testing a
Dark Release directly online
fault injection to test reliability
- delay or abort requests
Circuit Breakers
● Problem: Cascading Failures
○ Unpredictable failure on a service which affects all dependent services
○ When this happens, it’s difficult to understand root cause because many services perform badly
● Solution: Circuit Breaking
○ Design Pattern - breaker as a relay between two services and able to detect failing requests
○ Upon detected failures (e.g. timeouts on multiple requests) can interrupt connection and return error from then on
○ By preventing access to faulty service we should give it enough time to recover from failure (e.g. OOM)
○ Periodic polling for health of target service, when available, connection can be restored
● Main Concept - Backpressure: reduce traffic to faulty system assuming failures result from lack of resources
○ historically - circuit breakers as library built with application code (of requesting service) - e.g. Netflix Hystrix
■ problems - multiple langs to mantain and legacy code for which to be added (needs redeployment)
○ istio - circuit breakers can be managed directly by the proxy
■ stop making requests to a pod if multiple consecutive faulty requests were made (works on a pod level)
Circuit Breakers
● outlier detection on a DestinationRule
● configuration applied to a Service (i.e. host)
● metrics collected at pod level
● errors:
○ consecutiveGatewayErrors (HTTP 502, 503 and 504 - no 505!)
○ consecutive5xxErrors (all 5xx errors!)
● settings:
○ number of consecutive errors
○ time interval for consecutive errors
○ ejection duration
○ maxEjectionPercent of max ejected hosts in pool
○ minHealthPercent - apply only if at least % healthy in pool
● use a tool like fortio to generate load and test them
kind: DestinationRule
name: reviews-cb-policy
maxConnections: 100
http2MaxRequests: 1000
maxRequestsPerConnection: 10
consecutiveErrors: 7
interval: 5m
baseEjectionTime: 15m
TLS Encryption
● enable mutual TLS (mTLS) for proxy-to-proxy communication
● managed by citadel
● performed at transport layer - usable for HTTP, TCP, gRPC
● can be enforced as policy - prevent any non TLS traffic across cluster
● automatically enabled at istio installation!
● check Kiali for the lock symbol on edges
● Permissive Vs Strict mTLS
○ permissive - allows querying istio-based services (i.e., their proxies) from other namespaces where proxy injection is
not available - connection can’t be upgraded to TLS so it’s kept in unsecured plain text (e.g. HTTP)
○ strict - only allow mTLS traffic - this can be enabled with the PeerAuthentication set to spec.mtls.mode: STRICT
kind: PeerAuthentication
name: default
namespace: istio-system
mode: STRICT

  • 1. An Introduction to ISTIO Andrea Monacchi
  • 2. Agenda 1. K8s refresher 2. Introduction to Istio 3. Telemetry 4. Traffic Management 5. Security
  • 4. K8s Architecture ● Master (Supervisor) ○ api-server - exposes K8s APIs ○ etcd - distributed key-value storage ○ scheduler - selects a node to run a new pod ○ controller-manager - detects changes on nodes, number of replicas, endpoint availability, changes to service accounts, etc. ○ kube-dns - DNS service mapping services to pods ● Worker Nodes (hosting worload pods) ○ kubelet - runs containers based on Pod specs ○ kube-proxy - implements network rules ● Simple Setup - KubeAdm ○ KubeAdm Init (Master) / KubeAdm Join (each Worker) source:
  • 5. Basic Concepts ● Pod ○ group of colocated containers - shared IP address and file system volumes ○ most commonly - 1 container : 1 pod, sidecar container pattern ● Pod Controllers - define a template to run multiple pods in a group ○ Deployment - to define multiple replicas (ReplicaSet) ○ DaemonSet - runs a copy of a pod on each node available to the cluster (i.e. daemon) - for monitoring, storage, etc. ○ StatefulSet - pods with persistent IDs to directly match resources (e.g. volumes) ● Service discovery and Load Balancing ○ Service - maps a DNS name to a set of Pods - can load balance them ● Storage ○ PersistentVolume (PV) - volume statically available to cluster or dynamically provisioned using a storage class ○ PersistentVolumeClaim (PVC) - request for storage resolved with a PV ○ StorageClass - describe storage type as provisioner, parameters.type, reclaimPolicy
  • 7. Istio ● Open-source service mesh ● Unifies ways of securing, connecting and monitoring microservices! (reference) ○ automatic load balancing for HTTP, gRPC, WebSocker, TCP traffic ○ fine-grained control of traffic (routing rules and policies, retries, failovers, fault injection ○ policy layer supporting access control, rate limits and quotas ○ out-of the box metrics, logs, traces for all traffic across the cluster, including ingress and egress ○ secure service-to-service communication with strong identify-based authentication and authorization ● Collection of packages to be run on a K8s Cluster
  • 8. Istio Architecture 1. Data Plane a. Envoy Proxy injected as sidecar container b. Traffic Routing + Telemetry 2. Control Plane a. controls data plane configuration b. comprises: i. Pilot - acts on the configs. of Envoy Sidecars ii. Citadel - Identity & Access Management (IAM) iii. Galley - Overall Configuration Management c. configuration propagation i. input yaml detected by galley ii. configuration converted to istio format iii. istio format passed to pilot iv. pilot convert it to envoy configuration d. citadel - manages TLS/SSL certificates source:
  • 9. K8s Operators ● Software extensions to K8s meant for automation - original blog post ○ operator = custom resource definition (CRD) + controller ○ applicable if custom resources can be exposed via a CRUD (create, read, update, delete) REST API ■ resources no longer treated as collection of primitives (e.g. pods, deployment, services) but as a single object exposing only what shall be controlled from outside ■ object integrated with K8s api (http+kubectl) ○ control loop monitoring changes on monitored resources ■ runs beside the control plane (running default controllers), e.g. as any deployment ■ very application specific - translates to primitive resources (e.g. pods) ● Implementation ○ As a client querying the kube api ○ using an SDK (e,g. KUDO, KubeBuilder, Metacontroller, Operator Framework) ○ e.g. OperatorFramework - GoLang, Ansible, Helm - implementation possibilities
  • 10. Data Plane: The Envoy Proxy ● automatically injected as sidecar container for each pod - helper container design pattern ● Envoy proxy ○ Reverse Proxy in C++ from Lyft ○ Istio translates Yaml-based resource definitions into Envoy configurations automatically
  • 11. Istio Setup ● Prerequisite - Running K8s cluster ○ Minikube/microK8s for dev ○ Rancher/Kubespray/KOPS for cloud-based ● Install istioctl CLI to ● Install istio (preferred as operator) ○ install using istioctl ○ install using Helm Chart
  • 12. Istio Setup ● Prerequisite - Running K8s cluster ○ Minikube/microK8s for dev ○ Rancher/Kubespray/KOPS for cloud-based ● Install istioctl CLI to ● Install istio (preferred as operator) ○ install using istioctl ○ install using Helm Chart Tutorial On Ubuntu: sudo snap install microk8s --classic microk8s.start microk8s enable dns registry istio microk8s kubectl get all --all-namespaces Shortcut to kubectl for microk8s: kubectl --kubeconfig file alias mkctl="microk8s kubectl" mkctl get pods
  • 13. Istio Setup ((base) pilillo@pilillo-ThinkPad-P70:~/Documents$ mkctl get pods -n istio-system NAME READY STATUS RESTARTS AGE istio-tracing-579d7647d9-hxxrp 1/1 Running 0 14m grafana-d7994566f-wfjtl 1/1 Running 0 14m kiali-77f97f5b4d-nmgjk 1/1 Running 0 14m istio-citadel-67658cf6c-hdcjt 1/1 Running 0 14m istio-sidecar-injector-589988b5d6-stmnb 1/1 Running 0 14m istio-galley-567478fb94-lkqnf 1/1 Running 0 14m istio-grafana-post-install-1.5.1-v8zg9 0/1 Completed 0 14m istio-security-post-install-1.5.1-bpfbr 0/1 Completed 0 14m prometheus-9d65f7646-qjcqk 1/1 Running 0 14m istio-pilot-64d96677f8-l5gj7 2/2 Running 3 14m istio-ingressgateway-56bb766b96-fph97 1/1 Running 0 14m istio-egressgateway-756f9bc5b9-5jp2l 1/1 Running 0 14m istio-telemetry-64d8c69d67-fx5m2 2/2 Running 6 14m istio-policy-58d8b97644-8b4hc 2/2 Running 6 14m
  • 15. Telemetry ● Kiali (mkctl port-forward -n istio-system svc/kiali 20001:20001or directly istioctl dashboard) ○ Istio Dashboard - entrypoint to all features we will be discussing ○ shows interconnection of services as a graph (mesh) - Example ● Jaeger tracing (microk8s.istioctl dashboard jaeger) ○ displays service interaction in terms of traces - Example ● Prometheus + Grafana metrics ○ microk8s.istioctl dashboard prometheus ■ check “Status” > “Targets” to see scraped HTTP endpoints ○ microk8s.istioctl dashboard grafana ■ click “dashboard” > “manage” > “istio” > “Istio name Dashboard”
  • 17. Traffic Management ● New entities introduced by Istio: ○ Virtual Service ■ a set of custom traffic routing rules to apply to when a K8s service (host) is addressed on a specific protocol ■ the specification is per protocol (e.g. http, tcp) and can match a subset (i.e. service version) ■ the mapping subset/label (e.g. subset-version) for the service is defined as destination rule ○ Destination Rule ■ defines routing policies for a load balancer (somehow but not directly related to virtual service) ■ configures the load balancer, including settings for outlier detection to evict unhealthy hosts from the pool ○ Gateway ■ defines a load balancer placed at the edge of the mesh (i.e. as ingress/egress) ■ allows any virtual service in the same or a matching namespace (based on expression) to bind to it
  • 18. Canary Releases ● Deploy new version alongside old version ○ Define availability for a percentage of requests - to test new version (like a pilot) ○ Useful for very busy services whose offtime is not an option - Reduce risk of deploying possibly faulty code ● No direct solution in K8s ○ Would mean a Service mapping to both the new and old version running on different pods ■ e.g. deployment has service-name and service has service-name ■ any label can be used to group pods on a service, but kiali assigns to “app” a special meaning ■ a version label can be used along the app one, do distinguish the version used within the deployment ○ Default - (probabilistic) round robin on pods - to implement a percentage would need a proportional number of pods ● Kial UIi: “Actions” > “Create Weighted Routing” ○ to create a virtual service and a destination rule to the pod groups (based on their version label) ● Yaml definition for a VirtualService
  • 19. VirtualService apiVersion: kind: VirtualService metadata: name: just-a-name-for-your-virtual-service namespace: default spec: hosts: - your-service-name gateways: ~ http: - route: - destination: host: your-target-service-name subset: v1-group weight: 10 - destination: host: your-target-service-name subset: v2-group weight: 90 tcp: ~ tls: ~ exportTo: ~ can be a subset on the same service host or even different services just a name for a routing configuration! ● A Service maps a DNS name to a set of Pods (IP addrs) ● A VirtualService defines a set of routing rules (what/when to call) ● istio-pilot applies the VS spec as envoy configuration on istio each sidecar proxy the service we apply the rules to - <svcname>.<nsname>.svc.cluster.local we are intercepting traffic of the host svc and redirecting it to different ones
  • 20. DestinationRule apiVersion: kind: DestinationRule metadata: name: just-a-name-for-a-destination-rule namespace: default spec: host: service-name trafficPolicy: ~ subsets: - labels: version: v2 name: v1-group - labels: version: v2 name: v2-group ● directly related to a virtual service ● can be used to affect the load balancing for the original service define which pods should be part of each subset for the VS: ● service name matches in Service service-name ● labels.version lookups pods with same label value for key “version” ● the subset name is the one to be used in the VS
  • 21. Load Balancing algorithms: ● round robin ● hashing-based (session affinity - same user to same svc/pod) ○ only for HTTP connections - hash to route traffic ○ uses hash of either HTTP header, cookie or source IP ○ not working with weighted routing (routing comes before hashing) ● locality-based - based on traffic origin ○ distribute policy - weight on zones/locations ○ failover policy - failover when endpoint becomes unhealthy apiVersion: kind: DestinationRule metadata: name: just-a-name-for-a-destination-rule namespace: default spec: host: service-name trafficPolicy: loadBalancer: simple: ROUND_ROBIN subsets: - labels: version: v2 name: v1-group - labels: version: v2 name: v2-group trafficPolicy: loadBalancer: consistentHash: httpCookie: name: user ttl: 0s distribute: - from: us-west/zone1/* to: "us-west/zone1/*": 80 "us-west/zone2/*": 20 - from: us-west/zone2/* to: "us-west/zone1/*": 20 "us-west/zone2/*": 80
  • 22. Ingress Gateways ● mesh edge - alternative to classic K8s Ingress Controllers ● istio-ingressgateway pod and service ramped up during istio installation ● adds monitoring and usual istio functionalities for traffic routing ● configured as any other istio service rather than tech specific ingress controller ● by default deny all traffic apiVersion: kind: Gateway metadata: name: ingress-gateway-configuration spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: http hosts: - “” selects default istio ingress gateway who has the label set list of DNS names we are listening for (or * if we are just testing locally), this has to be reflected on the virtual service
  • 23. Exposed Virtual Service apiVersion: kind: VirtualService metadata: name: just-a-name-for-your-virtual-service namespace: default spec: hosts: - “” gateways: - ingress-gateway-configuration http: - route: - destination: host: your-target-service-name subset: v1-group weight: 10 - destination: host: your-target-service-name subset: v2-group weight: 90 tcp: ~ tls: ~ exportTo: ~ a DNS name or star to catch all (testing only!) http: - name: “whatever-name” match: - uri: prefix: “/something” - uri: prefix: “/else” rewrite: uri: “/newuri” route: - destination: host: your-target-service-name subset: v2-group weight: 90 fault: delay: percentage: value: 10.0 fixedDelay: 10s HTTP rewrite Exact-, prefix-, regex-based routing on incoing requests e.g., useful for testing a Dark Release directly online fault injection to test reliability - delay or abort requests
  • 24. Circuit Breakers ● Problem: Cascading Failures ○ Unpredictable failure on a service which affects all dependent services ○ When this happens, it’s difficult to understand root cause because many services perform badly ● Solution: Circuit Breaking ○ Design Pattern - breaker as a relay between two services and able to detect failing requests ○ Upon detected failures (e.g. timeouts on multiple requests) can interrupt connection and return error from then on ○ By preventing access to faulty service we should give it enough time to recover from failure (e.g. OOM) ○ Periodic polling for health of target service, when available, connection can be restored ● Main Concept - Backpressure: reduce traffic to faulty system assuming failures result from lack of resources ○ historically - circuit breakers as library built with application code (of requesting service) - e.g. Netflix Hystrix ■ problems - multiple langs to mantain and legacy code for which to be added (needs redeployment) ○ istio - circuit breakers can be managed directly by the proxy ■ stop making requests to a pod if multiple consecutive faulty requests were made (works on a pod level)
  • 25. Circuit Breakers ● outlier detection on a DestinationRule ● configuration applied to a Service (i.e. host) ● metrics collected at pod level ● errors: ○ consecutiveGatewayErrors (HTTP 502, 503 and 504 - no 505!) ○ consecutive5xxErrors (all 5xx errors!) ● settings: ○ number of consecutive errors ○ time interval for consecutive errors ○ ejection duration ○ maxEjectionPercent of max ejected hosts in pool ○ minHealthPercent - apply only if at least % healthy in pool ● use a tool like fortio to generate load and test them apiVersion: kind: DestinationRule metadata: name: reviews-cb-policy spec: host: trafficPolicy: connectionPool: tcp: maxConnections: 100 http: http2MaxRequests: 1000 maxRequestsPerConnection: 10 outlierDetection: consecutiveErrors: 7 interval: 5m baseEjectionTime: 15m
  • 27. TLS Encryption ● enable mutual TLS (mTLS) for proxy-to-proxy communication ● managed by citadel ● performed at transport layer - usable for HTTP, TCP, gRPC ● can be enforced as policy - prevent any non TLS traffic across cluster ● automatically enabled at istio installation! ● check Kiali for the lock symbol on edges ● Permissive Vs Strict mTLS ○ permissive - allows querying istio-based services (i.e., their proxies) from other namespaces where proxy injection is not available - connection can’t be upgraded to TLS so it’s kept in unsecured plain text (e.g. HTTP) ○ strict - only allow mTLS traffic - this can be enabled with the PeerAuthentication set to spec.mtls.mode: STRICT apiVersion: kind: PeerAuthentication metadata: name: default namespace: istio-system spec: mtls: mode: STRICT