Unclouding Container Challenges

Unclouding Container Challenges
Apr 21st, 2021
Harpratap Singh Layal
Cloud Platform Department
Rakuten Group, Inc.

2
Background – Compute platforms
Bare metal as a Service (BMaaS)
16 core
32 GB
1 Gbps
16 core
32 GB
1 Gbps
32 core
128 GB
10 gbps
16 core
64 GB
10 gbps
Container as a Service (CaaS)
Cluster X
App 1 App 2
App 2
Cluster Y
App 3 App 2
App 4

3
Background – What is CaaS?
PaaS
(Heroku
12 factor
apps)
Managed K8s
control plane (GKE,
EKS, AKS Full
customization)
Simple
container
scheduler
(Fargate,
CloudRun)
Only expose
selected
K8s API
(CaaS)
Opioninated
(Less flexibility)
Developer control & Responsibility
Default Container Networking, CI/CD, monitoring, security for Stateless & Stateful apps, Cron, GPU workloads

4
Challenge #1 : Communication Cost

5
Doing it the traditional way –
1. Communication lag – takes too long to formulate requirements from developers
2. XY problem – no idea what the real problem is
3. Validation and policy injection is manually done

6
Solution: Create an opionated Internal Developer Platform and form an API based contract with
users
Philosophy :
• When you have APIs and their documentation users rarely need to communicate with you
• Easier to explicitly define what you provide and what you don’t
• Standardization = low re-invention of wheel, less pets, easier to propagate tech culture
Implementation :
• In CaaS we make use of K8s APIs to expose features to users. Custom Resource Definitions (CRDs)
and Operators fits us well.
• Admission control webhooks, podSecurityPolicy and networkPolicy

7
Jiange : Validation without human communication
Jiange
etcd K8s API

9
Challenge #2 : Day 2 Ops
Day 1 Ops :
• Provisioning
• Step 1
• Step 2
• Step 3… N
• Procedural – easy to automate
Day 2 Ops:
• Maintainence
• Not always the same
• Improvements – need to keep an eye on various components
• Metrics
• Logs
• Traces

10
Solution: Infrastructure as Data instead of Infrastructure as Code
Script
for X
Script
for Y
Script
for Z
IaC – run scripts one by one
Data
Store Infra
Infra
Control
Loop
Reconcile Spec
Reconcile Status
IaD – Store the state as Data and
reconcile until state is achieved

11
Solution: Infrastructure as Data instead of Infrastructure as Code
In CaaS we have written controllers based on same approach
• Klone – Binary that provisions master nodes and system components based on git configs (written in
Go)
• Node operator – used for creating worker nodes
• Namespace operator – used for creating user namespaces with correct permissions, good defaults,
jenkins repositories, harbor projects etc when user on boards.
• Gateway controller – For creating istio ingress gateways
• Wildcard instant domain controller – For instantly creating simple domains to test out services
• Cloud controlller manager – for creating load balancers
• Endpoints controller – for creating container native load balancers

12
Internet
Load Balancer
K8s API
Node
List
Cloud
Controller
Manager
K8s cluster nodes

13
Challenge #3: Container networks

14
Challenge #3 : Container Networks
• Kubernetes network != Host Network
• Pods are not first class citizens (not flat network)
• Pods are ephemeral
• Fair Load balancing does not happen when using NodePorts
• Additional hops (through K8s node Iptables)
• Source IP is not preserved
• Network is difficult to use

15
Solution: No one size fits all, provide all
solutions with good defaults and let users
choose
Shared Gateway +
Auto Assigned
Domain
Dedicated Gateway +
Custom Domain
Domain Auto Assigned Any Domain
Performance Not isolated Isolated
Maintainence (for
users)
Zero High
Customization Low Fully customizable
Cost Low High

16
Solution: Container Native Load balancing
Legacy Load
Balancer
Container Native
Load Balancer
Number of hops 2 1
IP preservation Remote IP lost Remote IP
preserved
Load Balancing Across nodes Across containers
Health checks Only for Nodes Application level
health checks

17
Future Challenges:
Multicluster CaaS -
Network
Deployments
IPv4 not enough (need IPv6 and/or VPCs)
Stateful apps -
Local persistence
Remote persistence
GPU
SRIOV
CPU pinning
Single Data proxy

Unclouding Container Challenges

Unclouding Container Challenges

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Unclouding Container Challenges

Similar to Unclouding Container Challenges (20)

More from Rakuten Group, Inc.

More from Rakuten Group, Inc. (20)

Recently uploaded

Recently uploaded (20)

Unclouding Container Challenges