Successfully reported this slideshow.
Your SlideShare is downloading. ×

Unclouding Container Challenges

Ad

Unclouding Container Challenges
Apr 21st, 2021
Harpratap Singh Layal
Cloud Platform Department
Rakuten Group, Inc.

Ad

2
Background – Compute platforms
Bare metal as a Service (BMaaS)
16 core
32 GB
1 Gbps
16 core
32 GB
1 Gbps
32 core
128 GB
...

Ad

3
Background – What is CaaS?
PaaS
(Heroku
12 factor
apps)
Managed K8s
control plane (GKE,
EKS, AKS Full
customization)
Sim...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 18 Ad
1 of 18 Ad

More Related Content

Unclouding Container Challenges

  1. 1. Unclouding Container Challenges Apr 21st, 2021 Harpratap Singh Layal Cloud Platform Department Rakuten Group, Inc.
  2. 2. 2 Background – Compute platforms Bare metal as a Service (BMaaS) 16 core 32 GB 1 Gbps 16 core 32 GB 1 Gbps 32 core 128 GB 10 gbps 16 core 64 GB 10 gbps Container as a Service (CaaS) Cluster X App 1 App 2 App 2 Cluster Y App 3 App 2 App 4
  3. 3. 3 Background – What is CaaS? PaaS (Heroku 12 factor apps) Managed K8s control plane (GKE, EKS, AKS Full customization) Simple container scheduler (Fargate, CloudRun) Only expose selected K8s API (CaaS) Opioninated (Less flexibility) Developer control & Responsibility Default Container Networking, CI/CD, monitoring, security for Stateless & Stateful apps, Cron, GPU workloads
  4. 4. 4 Challenge #1 : Communication Cost
  5. 5. 5 Challenge #1 : Communication Cost Doing it the traditional way – 1. Communication lag – takes too long to formulate requirements from developers 2. XY problem – no idea what the real problem is 3. Validation and policy injection is manually done
  6. 6. 6 Challenge #1 : Communication Cost Solution: Create an opionated Internal Developer Platform and form an API based contract with users Philosophy : • When you have APIs and their documentation users rarely need to communicate with you • Easier to explicitly define what you provide and what you don’t • Standardization = low re-invention of wheel, less pets, easier to propagate tech culture Implementation : • In CaaS we make use of K8s APIs to expose features to users. Custom Resource Definitions (CRDs) and Operators fits us well. • Admission control webhooks, podSecurityPolicy and networkPolicy
  7. 7. 7 Challenge #1 : Communication Cost Jiange : Validation without human communication Jiange etcd K8s API
  8. 8. 8 Challenge #2: Day 2 Ops
  9. 9. 9 Challenge #2 : Day 2 Ops Day 1 Ops : • Provisioning • Step 1 • Step 2 • Step 3… N • Procedural – easy to automate Day 2 Ops: • Maintainence • Not always the same • Improvements – need to keep an eye on various components • Metrics • Logs • Traces
  10. 10. 10 Challenge #2 : Day 2 Ops Solution: Infrastructure as Data instead of Infrastructure as Code Script for X Script for Y Script for Z IaC – run scripts one by one Data Store Infra Infra Control Loop Reconcile Spec Reconcile Status IaD – Store the state as Data and reconcile until state is achieved
  11. 11. 11 Challenge #2 : Day 2 Ops Solution: Infrastructure as Data instead of Infrastructure as Code In CaaS we have written controllers based on same approach • Klone – Binary that provisions master nodes and system components based on git configs (written in Go) • Node operator – used for creating worker nodes • Namespace operator – used for creating user namespaces with correct permissions, good defaults, jenkins repositories, harbor projects etc when user on boards. • Gateway controller – For creating istio ingress gateways • Wildcard instant domain controller – For instantly creating simple domains to test out services • Cloud controlller manager – for creating load balancers • Endpoints controller – for creating container native load balancers
  12. 12. 12 Challenge #3 : Day 2 Ops Internet Load Balancer K8s API Node List Cloud Controller Manager K8s cluster nodes
  13. 13. 13 Challenge #3: Container networks
  14. 14. 14 Challenge #3 : Container Networks • Kubernetes network != Host Network • Pods are not first class citizens (not flat network) • Pods are ephemeral • Fair Load balancing does not happen when using NodePorts • Additional hops (through K8s node Iptables) • Source IP is not preserved • Network is difficult to use
  15. 15. 15 Challenge #3 : Container Networks Solution: No one size fits all, provide all solutions with good defaults and let users choose Shared Gateway + Auto Assigned Domain Dedicated Gateway + Custom Domain Domain Auto Assigned Any Domain Performance Not isolated Isolated Maintainence (for users) Zero High Customization Low Fully customizable Cost Low High
  16. 16. 16 Challenge #3 : Container Networks Solution: Container Native Load balancing Legacy Load Balancer Container Native Load Balancer Number of hops 2 1 IP preservation Remote IP lost Remote IP preserved Load Balancing Across nodes Across containers Health checks Only for Nodes Application level health checks
  17. 17. 17 Future Challenges: Multicluster CaaS - Network Deployments IPv4 not enough (need IPv6 and/or VPCs) Stateful apps - Local persistence Remote persistence GPU SRIOV CPU pinning Single Data proxy

×