Multi-Cluster Kubernetes and
Service Mesh Patterns
Christian Posta
Field CTO – Solo.io
2 | Copyright © 2020
CHRISTIAN POSTA
Global Field CTO, Solo.io
@christianposta
christian@solo.io
https://blog.christianposta.com
https://slideshare.net/ceposta
3 | Copyright © 2020
Challenges
• Improve velocity of teams building and delivering code
• Decentralized implementations vs centralized operations
• Connect and include existing systems and investments
• Improve security posture
• Stay within regulations and compliance
4 | Copyright © 2020
More, smaller clusters
• High availability
• Compliance
• Isolation / Autonomy
• Scale
• Data locality, cost
• Public/DMZ/Private networks
5 | Copyright © 2020
Multiple clusters
• Exact replicas of each other, same fleet?
• Separate, non-uniform deployments?
• Single operational/administrative control
• Segmented by network? Segmented by team?
• Independent administration?
6 | Copyright © 2020
Cluster federation
• Autonomous clusters
• Different organizational/network/administrative boundaries
• Share pieces of configuration
• For those shared pieces, treat union as a single unit
• Uses an orchestrator to stitch together policies for federation
7 | Copyright © 2020
Example: Kubefed
Cluster 1
Cluster 2
Cluster 0
Kubefed CP
Federated
Resources
watches
Federate to
clusters
https://github.com/kubernetes-sigs/kubefed
8 | Copyright © 2020
Example: Kubefed
apiVersion: types.kubefed.io/v1beta1
kind: FederatedService
metadata:
name: echo-server
spec:
placement:
clusterSelector:
matchLabels: {}
template:
metadata:
labels:
app: echo-server
spec:
ports:
- name: http
port: 8080
selector:
app: echo-server
9 | Copyright © 20209 | Copyright © 2020
Demo
Simple Kubernetes federation
10 | Copyright © 2020
Services need to communicate with each other
11 | Copyright © 2020
Pattern: flat network across pods
Account
User
Products
Cluster 1 Cluster 2
History
12 | Copyright © 2020
Pattern: Different network, expose all services
Account
User
Products
Cluster 1 Cluster 2
History
13 | Copyright © 2020
Pattern: Different network, controlled gateway
Account
User
Products
Cluster 1 Cluster 2
History
14 | Copyright © 2020
Forces to balance
• Security (authz/authn/encryption/identity)
• Service discovery
• Failover / traffic shifting / transparent routing
• Observability
• Separate networks
• Well-defined fault domains
• Building for scale
15 | Copyright © 2020
Could you build these patterns
just using Kubernetes?
16 | Copyright © 2020
Service Mesh can help
17 | Copyright © 2020
Envoy is the magic behind service mesh
http://envoyproxy.io
18 | Copyright © 2020
Envoy implements:
• zone aware, priority/locality load balancing
• circuit breaking, outlier detection
• timeouts, retries, retry budgets
• traffic shadowing
• request racing
• rate limiting
• RBAC, TLS origination/termination
• access logging, statistics collection
19 | Copyright © 2020
Envoy to do application networking heavy lifting
Account
work
load
work
load
work
load
mTLS
• Transparent client-side routing
decisions
• TLS orig/termination
• Circuit breaking
• Stats collection
20 | Copyright © 2020
Envoy as backbone for multi-cluster
communication federation
Account
User
Cluster 1 Cluster 2
Products
History
User
21 | Copyright © 2020
Other key Envoy proxying features
• Request hedging
• Retry Budgets
• Load balancing priorities
• Locality weighted load balancing
• Zone aware routing
• Degraded endpoints (fallback)
• Aggregated clusters
22 | Copyright © 2020
Exploring Envoy failover routing capabilities:
Request racing
Account
work
load
work
load
work
load
Calls
http://products.service/
work
load
work
load
us-west-1
us-west-2
Timeout
Race request
First to return is the
response to the caller
23 | Copyright © 2020
Exploring Envoy failover routing capabilities:
Zone aware routing (Envoy decides)
Account
work
load
work
load
work
load
Calls
http://products.service/
work
load
work
load
us-west-1
us-west-2
Not enough healthy
hosts in same zone
Spill over to
another zone
24 | Copyright © 2020
Exploring Envoy failover routing capabilities:
Locality aware (Control plane decides)
Account
work
load
work
load
work
load
Calls
http://products.service/
work
load
work
load
us-west-1
us-west-2
Not enough healthy
hosts in same zone
Spill over to
another zone
W=1
W=1
W=1
W=5
W=5
25 | Copyright © 2020
Exploring Envoy failover routing capabilities:
Aggregate Cluster (for routing to gateways)
Account
work
load
work
load
work
load
Calls
http://products.service/
Edge
gw
us-west-1
us-west-2
EDS
Strict DNS
26 | Copyright © 202026 | Copyright © 2020
Multi-cluster examples
Service mesh examples using Envoy Proxy
27 | Copyright © 2020
Istio shared control plane, flat network
Account
User
Cluster 1 Cluster 2
Products
History
User
Istiod
28 | Copyright © 2020
Thoughts about shared control plane/flat network
• Simplest set up for Istio multi-cluster
• No special Envoy routing (though may use zone-aware)
• Shared control plane increases the failure domain to multiple
clusters
• Use flat networking if possible (simpler) but may not have/want that
option
• No special considerations for identity (identity domain is shared)
• Still need to federate telemetry collection
29 | Copyright © 2020
Account
User
Cluster 1 Cluster 2
Products
History
User
Istiod
Istio shared control plane, separate networks
30 | Copyright © 2020
Thoughts about shared control plane/separate network
• Uses a gateway to allow communication between networks
• Uses Envoy Locality Weighted LB (for the gateway endpoints). Istio
calls this “split horizon EDS”.
• Shares same failure domain across all clusters
• Use the gateways to facilitate communication AND control plane
• Slight increase in burden on operator to label networks and
gateway endpoints correctly so Istio has that information
31 | Copyright © 2020
Account
User
Cluster 1 Cluster 2
Products
History
User
Istiod
Istio separate control planes, separate networks
Istiod
32 | Copyright © 2020
Thoughts about separate control plane/separate
network
• Uses a gateway to allow communication between networks
• Uses Istio’s ServiceEntry mechanism to enable cross-network
discovery
• Independent control planes
• Separate, independent failure domains
• Doesn’t solve where trust domains MUST be separate (with
federation at the boundaries)
• Increase burden on operator to maintain service discovery, identity
federation, and multi-cluster configuration across meshes
33 | Copyright © 2020
Account
Cluster 1 Cluster 2
User
User
Istiod
Example multi-cluster routing with ServiceEntry
Istiod
http://users.default.svc.cluster.local
http://users.default.cluster-2
ServiceEntry
users.default.cluster-2
34 | Copyright © 2020
ServiceEntry for service discovery
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: users-cluster2
spec:
hosts:
- users.default.cluster2
location: MESH_INTERNAL
ports:
- name: http1
number: 8000
protocol: http
resolution: DNS
addresses:
- 240.0.0.2
endpoints:
- address: 10.0.2.5
ports:
http1: 15443
35 | Copyright © 2020
Forces to balance
• Security (authz/authn/encryption/identity)
• Service discovery
• Failover / traffic shifting / transparent routing
• Observability
• Separate networks
• Well-defined fault domains
• Building for scale
36 | Copyright © 2020
What to do about the added
burden for the operator?
37 | Copyright © 2020 @christianposta
Cluster 1 Cluster 2
Istiod
work
load
Ingress
Gateway
Istiod
work
load
work
load
work
load
work
load
work
load
Service Mesh
Hub
Ingress
Gateway
Management
Plane
38 | Copyright © 202038 | Copyright © 2020
Demo
Service Mesh Hub
39 | Copyright © 2020 @christianposta
Istiod
work
load
Ingress
Gateway
Istiod
work
load
work
load
work
load
work
load
work
load
Service Mesh
Hub
Ingress
Gateway
Management Plane Remote Cluster
40 | Copyright © 2020 @christianposta
Istiod
work
load
Ingress
Gateway
Istiod
work
load
work
load
work
load
work
load
work
load
Service Mesh
Hub
CSR
agent
CSR
agent
Create cert/key
and CSR
Sign cert w/
shared root
Shared root
Ingress
Gateway
Management Plane Remote Cluster
41 | Copyright © 2020 @christianposta
Istiod
work
load
Ingress
Gateway
Istiod
work
load
work
load
work
load
work
load
work
load
Service Mesh
Hub
CSR
agent
CSR
agent
Shared root
Ingress
Gateway
Chain with
same root
Management Plane Remote Cluster
42 | Copyright © 2020 @christianposta
THANK YOU FOR ATTENDING!
@christianposta
christian@solo.io
https://blog.christianposta.com
https://slideshare.net/ceposta
43 | Copyright © 2020
• https://solo.io
• https://slack.solo.io
• https://gloo.solo.io
• https://envoyproxy.io
• https://istio.io
• https://webassemblyhub.io
• https://servicemeshhub.io
• https://blog.christianposta.com

Multicluster Kubernetes and Service Mesh Patterns

  • 1.
    Multi-Cluster Kubernetes and ServiceMesh Patterns Christian Posta Field CTO – Solo.io
  • 2.
    2 | Copyright© 2020 CHRISTIAN POSTA Global Field CTO, Solo.io @christianposta christian@solo.io https://blog.christianposta.com https://slideshare.net/ceposta
  • 3.
    3 | Copyright© 2020 Challenges • Improve velocity of teams building and delivering code • Decentralized implementations vs centralized operations • Connect and include existing systems and investments • Improve security posture • Stay within regulations and compliance
  • 4.
    4 | Copyright© 2020 More, smaller clusters • High availability • Compliance • Isolation / Autonomy • Scale • Data locality, cost • Public/DMZ/Private networks
  • 5.
    5 | Copyright© 2020 Multiple clusters • Exact replicas of each other, same fleet? • Separate, non-uniform deployments? • Single operational/administrative control • Segmented by network? Segmented by team? • Independent administration?
  • 6.
    6 | Copyright© 2020 Cluster federation • Autonomous clusters • Different organizational/network/administrative boundaries • Share pieces of configuration • For those shared pieces, treat union as a single unit • Uses an orchestrator to stitch together policies for federation
  • 7.
    7 | Copyright© 2020 Example: Kubefed Cluster 1 Cluster 2 Cluster 0 Kubefed CP Federated Resources watches Federate to clusters https://github.com/kubernetes-sigs/kubefed
  • 8.
    8 | Copyright© 2020 Example: Kubefed apiVersion: types.kubefed.io/v1beta1 kind: FederatedService metadata: name: echo-server spec: placement: clusterSelector: matchLabels: {} template: metadata: labels: app: echo-server spec: ports: - name: http port: 8080 selector: app: echo-server
  • 9.
    9 | Copyright© 20209 | Copyright © 2020 Demo Simple Kubernetes federation
  • 10.
    10 | Copyright© 2020 Services need to communicate with each other
  • 11.
    11 | Copyright© 2020 Pattern: flat network across pods Account User Products Cluster 1 Cluster 2 History
  • 12.
    12 | Copyright© 2020 Pattern: Different network, expose all services Account User Products Cluster 1 Cluster 2 History
  • 13.
    13 | Copyright© 2020 Pattern: Different network, controlled gateway Account User Products Cluster 1 Cluster 2 History
  • 14.
    14 | Copyright© 2020 Forces to balance • Security (authz/authn/encryption/identity) • Service discovery • Failover / traffic shifting / transparent routing • Observability • Separate networks • Well-defined fault domains • Building for scale
  • 15.
    15 | Copyright© 2020 Could you build these patterns just using Kubernetes?
  • 16.
    16 | Copyright© 2020 Service Mesh can help
  • 17.
    17 | Copyright© 2020 Envoy is the magic behind service mesh http://envoyproxy.io
  • 18.
    18 | Copyright© 2020 Envoy implements: • zone aware, priority/locality load balancing • circuit breaking, outlier detection • timeouts, retries, retry budgets • traffic shadowing • request racing • rate limiting • RBAC, TLS origination/termination • access logging, statistics collection
  • 19.
    19 | Copyright© 2020 Envoy to do application networking heavy lifting Account work load work load work load mTLS • Transparent client-side routing decisions • TLS orig/termination • Circuit breaking • Stats collection
  • 20.
    20 | Copyright© 2020 Envoy as backbone for multi-cluster communication federation Account User Cluster 1 Cluster 2 Products History User
  • 21.
    21 | Copyright© 2020 Other key Envoy proxying features • Request hedging • Retry Budgets • Load balancing priorities • Locality weighted load balancing • Zone aware routing • Degraded endpoints (fallback) • Aggregated clusters
  • 22.
    22 | Copyright© 2020 Exploring Envoy failover routing capabilities: Request racing Account work load work load work load Calls http://products.service/ work load work load us-west-1 us-west-2 Timeout Race request First to return is the response to the caller
  • 23.
    23 | Copyright© 2020 Exploring Envoy failover routing capabilities: Zone aware routing (Envoy decides) Account work load work load work load Calls http://products.service/ work load work load us-west-1 us-west-2 Not enough healthy hosts in same zone Spill over to another zone
  • 24.
    24 | Copyright© 2020 Exploring Envoy failover routing capabilities: Locality aware (Control plane decides) Account work load work load work load Calls http://products.service/ work load work load us-west-1 us-west-2 Not enough healthy hosts in same zone Spill over to another zone W=1 W=1 W=1 W=5 W=5
  • 25.
    25 | Copyright© 2020 Exploring Envoy failover routing capabilities: Aggregate Cluster (for routing to gateways) Account work load work load work load Calls http://products.service/ Edge gw us-west-1 us-west-2 EDS Strict DNS
  • 26.
    26 | Copyright© 202026 | Copyright © 2020 Multi-cluster examples Service mesh examples using Envoy Proxy
  • 27.
    27 | Copyright© 2020 Istio shared control plane, flat network Account User Cluster 1 Cluster 2 Products History User Istiod
  • 28.
    28 | Copyright© 2020 Thoughts about shared control plane/flat network • Simplest set up for Istio multi-cluster • No special Envoy routing (though may use zone-aware) • Shared control plane increases the failure domain to multiple clusters • Use flat networking if possible (simpler) but may not have/want that option • No special considerations for identity (identity domain is shared) • Still need to federate telemetry collection
  • 29.
    29 | Copyright© 2020 Account User Cluster 1 Cluster 2 Products History User Istiod Istio shared control plane, separate networks
  • 30.
    30 | Copyright© 2020 Thoughts about shared control plane/separate network • Uses a gateway to allow communication between networks • Uses Envoy Locality Weighted LB (for the gateway endpoints). Istio calls this “split horizon EDS”. • Shares same failure domain across all clusters • Use the gateways to facilitate communication AND control plane • Slight increase in burden on operator to label networks and gateway endpoints correctly so Istio has that information
  • 31.
    31 | Copyright© 2020 Account User Cluster 1 Cluster 2 Products History User Istiod Istio separate control planes, separate networks Istiod
  • 32.
    32 | Copyright© 2020 Thoughts about separate control plane/separate network • Uses a gateway to allow communication between networks • Uses Istio’s ServiceEntry mechanism to enable cross-network discovery • Independent control planes • Separate, independent failure domains • Doesn’t solve where trust domains MUST be separate (with federation at the boundaries) • Increase burden on operator to maintain service discovery, identity federation, and multi-cluster configuration across meshes
  • 33.
    33 | Copyright© 2020 Account Cluster 1 Cluster 2 User User Istiod Example multi-cluster routing with ServiceEntry Istiod http://users.default.svc.cluster.local http://users.default.cluster-2 ServiceEntry users.default.cluster-2
  • 34.
    34 | Copyright© 2020 ServiceEntry for service discovery apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: users-cluster2 spec: hosts: - users.default.cluster2 location: MESH_INTERNAL ports: - name: http1 number: 8000 protocol: http resolution: DNS addresses: - 240.0.0.2 endpoints: - address: 10.0.2.5 ports: http1: 15443
  • 35.
    35 | Copyright© 2020 Forces to balance • Security (authz/authn/encryption/identity) • Service discovery • Failover / traffic shifting / transparent routing • Observability • Separate networks • Well-defined fault domains • Building for scale
  • 36.
    36 | Copyright© 2020 What to do about the added burden for the operator?
  • 37.
    37 | Copyright© 2020 @christianposta Cluster 1 Cluster 2 Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub Ingress Gateway Management Plane
  • 38.
    38 | Copyright© 202038 | Copyright © 2020 Demo Service Mesh Hub
  • 39.
    39 | Copyright© 2020 @christianposta Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub Ingress Gateway Management Plane Remote Cluster
  • 40.
    40 | Copyright© 2020 @christianposta Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub CSR agent CSR agent Create cert/key and CSR Sign cert w/ shared root Shared root Ingress Gateway Management Plane Remote Cluster
  • 41.
    41 | Copyright© 2020 @christianposta Istiod work load Ingress Gateway Istiod work load work load work load work load work load Service Mesh Hub CSR agent CSR agent Shared root Ingress Gateway Chain with same root Management Plane Remote Cluster
  • 42.
    42 | Copyright© 2020 @christianposta THANK YOU FOR ATTENDING! @christianposta christian@solo.io https://blog.christianposta.com https://slideshare.net/ceposta
  • 43.
    43 | Copyright© 2020 • https://solo.io • https://slack.solo.io • https://gloo.solo.io • https://envoyproxy.io • https://istio.io • https://webassemblyhub.io • https://servicemeshhub.io • https://blog.christianposta.com

Editor's Notes

  • #3 How does Solo help do this? Help pick right tech when it’s warranted (Envoy) Hedge when market still volatile (SMH) Simplify adoption Enterprise focus (security, heterogeneous) Solve the problem everywhere regardless of technology, infrastructure, footprint On prem/public cloud/hybrid Any service mesh technology VMs, containers, et. al
  • #11 Kubernetes the defacto way to build and deploy containeriszed microservices … but not everything runs in Kubernetes, and not everything will run on premises
  • #18 Need a way to automate handling of explosive numbers of workloads (microservices) Placement of workloads AKA deployments Autoscale, health check, start/stop, rebalance, scale up/down Building applications for Kubernetes (or any cloud native platform) is fundamentally different Why Kubernetes won: * community Right level of API Extensible Declarative configuration model Foundation of DevOps and Automation model Adopting microservices to go fast!
  • #20 Need a way to automate handling of explosive numbers of workloads (microservices) Placement of workloads AKA deployments Autoscale, health check, start/stop, rebalance, scale up/down Building applications for Kubernetes (or any cloud native platform) is fundamentally different Why Kubernetes won: * community Right level of API Extensible Declarative configuration model Foundation of DevOps and Automation model Adopting microservices to go fast!
  • #23 Need a way to automate handling of explosive numbers of workloads (microservices) Placement of workloads AKA deployments Autoscale, health check, start/stop, rebalance, scale up/down Building applications for Kubernetes (or any cloud native platform) is fundamentally different Why Kubernetes won: * community Right level of API Extensible Declarative configuration model Foundation of DevOps and Automation model Adopting microservices to go fast!
  • #24 Need a way to automate handling of explosive numbers of workloads (microservices) Placement of workloads AKA deployments Autoscale, health check, start/stop, rebalance, scale up/down Building applications for Kubernetes (or any cloud native platform) is fundamentally different Why Kubernetes won: * community Right level of API Extensible Declarative configuration model Foundation of DevOps and Automation model Adopting microservices to go fast!
  • #25 Need a way to automate handling of explosive numbers of workloads (microservices) Placement of workloads AKA deployments Autoscale, health check, start/stop, rebalance, scale up/down Building applications for Kubernetes (or any cloud native platform) is fundamentally different Why Kubernetes won: * community Right level of API Extensible Declarative configuration model Foundation of DevOps and Automation model Adopting microservices to go fast!
  • #26 Need a way to automate handling of explosive numbers of workloads (microservices) Placement of workloads AKA deployments Autoscale, health check, start/stop, rebalance, scale up/down Building applications for Kubernetes (or any cloud native platform) is fundamentally different Why Kubernetes won: * community Right level of API Extensible Declarative configuration model Foundation of DevOps and Automation model Adopting microservices to go fast!