From KubeCon / CloudNativeCon 2019 customer stories, case studies, use cases - RECAP. Kubernetes & CNCF project use cases summary presented in Bangalore CNCF Meetup.
1. 1
Cloud Native Use Cases
From KubeCon 2019 San Diego – A Recap
by Krishna Kumar,
https://www.meetup.com/Bangalore-CNCF-Meetup
2. 2
KubeCon/CloudNativeCon 2019 San Diego
●
The largest CNCF Event Ever!!
●
November 2019 at San Diego, US
●
12000+ Attendees
●
100+ of vendors
●
100+ announcements
●
300+ sessions/presentations
●
CNCF: 20+ projects; 500+
members; 100+ big vendors
●
In 2019 –> 200+ members joined
● Videos & Slides from the event:
https://github.com/cloudyuga/kubecon19-NA#case%20studi
es
https://www.youtube.com/playlist?list=PLj6h78yzYM2ND
s-iu8WU5fMxINxHXlien
Top10 Announcements:
1. Helm 3 is Launched
2. AWS, Intuit and WeaveWorks Collaborate on Argo
Flux
3. Confidential Computing for Kubernetes from
Microsoft
4. Red Hat Launches CodeReady Workspaces 2.0
5. Mirantis Launches Kubernetes as a Service (KaaS)
6. O’Reilly Acquires Katacoda
7. Portworx Launches PX-Autopilot
8. Diamanti Announces Spektra Hybrid Cloud Solution
9. Buoyant Announces Dive, a SaaS Control Plane for
Kubernetes
10.Rancher Extends Kubernetes to the Edge
https://www.forbes.com/sites/janakirammsv/2019/11/24/10-most-interesting-ann
ouncements-from-kubecon--cloudnativecon-2019/#38d26962583b
3. 3
Whatwehavetoday.....? ●
KubeCon 2019 San Diego Quick Recap of some case studies:
(1) Cruise - Multi tenancy
(2) Slack - DB Migration toVitess
(3) Yahoo - Istio & k8s on Prem
(4) Gusto - Moving a startup to k8s
(5) Reddit - k8s in production
(6) Tinder - Moving to k8s journey
(7) Spotify - Envoy migration
(8) Airbnb - Scaling 1000s of nodes in multicluster
(9) Ebay - Setup Search on k8s
(10) Uber - Kubernetes Migration Journey
(11) Lyft – Large scale stateful workloads in k8s
(12) GrapeUp - Continous deployments to Car
(13) Planet Scale - DB Service on k8s
(14) Sales Force - Enterprise Cloud
(15) Goldman Sachs - k8s Policy & OPA implementation
(16) Fidelity - Finance grade K8s with GitOps
(17) FreddiMac – Istio Journey Brownfield to GreenField
(18) Govt of Ottawa - Moving Legacy to Cloud
(19) Min of Def. Israel - AI in k8s production
(20) Dept of Def. US - Moved to k8s & Istio
4. 4
Cruise – Multi tenancy
●
Building autonomous vehicle
●
Clusters – 12- 26
●
Large Cluser – 1000 nodes – 64 or 32 vCPU each
●
Using Gsuite & GKE. Use tools Daytona, Vault, Krail, Isopod, Juno – proprietary
●
Built a scalable multi tenant system with shared clusters mostly. Downtime & cost both low.
●
Domain isolation – Environmental vs. Organizational. Project based namespaces.
●
Permission isolation – RBAC & Google group; Secrets at application level;
●
System isolation – machine, nodepool, cluster, network
●
Resource isolation – Storage volumes & quotas
●
Network isolation – Shared Tunnels (NAT gateways); Shared observability logs
●
More here: https://www.youtube.com/watch?v=m19D9vZ1QFQ
5. 5
Slack – DB Migration toVitess
●
Migrating datasets to Vitess – Database clustering Mysql with horizontal scaling
●
Storage 7.5+PB; Queries 53+ billion;
●
Small shards vs. Big shards ; Durability through replication
●
Fault tolerance & Isolation – blast radius minimum; isolated topologies
●
Moved from Single Cell to multiplel cell
●
More here: https://www.youtube.com/watch?v=aTItjMJE17c
6. 6
Yahoo – Istio & k8s on Prem
●
990+ apps; 1k+ stateful apps; 18 prod clusters (9 prod & 9 canary); 7 DC; 2900+ nodes; 1.5M+ RPS on Ingress
●
The orange blocks in picture Yahoo built. E,g: Authenz – identity service ; Auth Webhook;
●
Mapped RBAC in Athenz domoain.
●
Soft multi tenancy – isolated namesapces – some dedicated cluster only -
●
Istio – Network transparent to applications – mutual TLS -
●
K8s identity provider for every pod idenity – envoy RBAC – SPIFFE X509 -
●
Proprietary tempalte and template engine – create expanded YAML ist – In CI/CD pipeline
●
Developers are happy & Efficeint deployment mechanism in place.
●
More here: https://www.youtube.com/watch?v=fEaVU1i-fOQ
●
7. 7
Gusto – Moving a startup to k8s
●
Gusto - 100K customers - Payroll management
●
GoSpotcheck – 200K task / day
●
A Heroku PaaS platform in place initially and moved to GKE evntually. AWS to Google cloud – Heroku to k8s
●
20 months total duration – started with 2 guys
●
Containerizing existing apps started with Trail & Error!
●
Use terraform for GKE cluster. Use Docker Hub extensively.
●
Rails, Ambassador, Envoy, GRPC, SuperGloo, Harness for CD, No spinnaker, Login with Sumo from traditional env.
●
Developers are happy - Moved a monolithic in 6 weeks window – very efficient
●
Management happy - Saved from $110K+/month to $40K/month
●
More here: https://www.youtube.com/watch?v=AqMxaxJsJKY
8. 8
Reddit – k8s in production
●
Home for discussion for web
●
330M+ monthly users; 16M+ posts/month
●
30K k8s users/community – r/kubernetes
●
Org wide onboarding process initiated successfully. Empowered service owners to design their own.
●
Moved to AWS Multi AZ from single AZ cluster for reliability and better traffic. Mirrored clusters prevented outage.
●
CDN + LB handle unhealthy clusters. 19 clusters - OPA running in all.
●
Spinnaker + Autogenerated Helm charts + templates based YAML + Terraform – to Sync clusters
●
Dev env: Started with Skaffold + minikube. Now Remote dev clusters & starklark resource generator
●
More here: https://www.youtube.com/watch?v=WTbIBqNcjoQ
9. 9
Tinder – Moving to k8s journey
●
Tinder is a app for Meeting new people
●
Legacy : AWS instances + Puppet + prometheus. 30 source repo with various languages
●
2000 nodes + 18000 cores + 6 Control plane, 30K pods, 130K container
●
750K samples/sec Prometheus + 5TB day og ingetion AWS K8s
●
Terraform + kube-aws + peered VPC + Endpoints ELB
●
1000+ Pods CoreDNS Daemonsets, One Envoy in AZ, Frontend TCP ELB, 2-6 sidecar per pod, Thanos
●
Issues faced: ARP exhuastion, DNS timeouts, unbalanced load, etc.
●
Planning multicluster deployment from CI/CD and also prometheus logs across clusters
●
More here: https://www.youtube.com/watch?v=o3WXPXDuCSU
10. 10
Spotify – Envoy migration
●
Audio streaming platforms – 248M users – 8M+ RPS - 1200 microS - 3B+ playlists
●
GCP – US, Europe, Asia
●
Nginx & haproxy based environment moved to envoy
●
Migration is transparent – shift slowly to Edge – almost zero downrime
●
GCP LB + you need to know the traffic flow well for zero downtime
●
Rate limiting & Auth schems needs to look
●
Achieved automated migration with reliable strategy
●
More here: https://www.youtube.com/watch?v=I_oa8l0j-yM
11. 11
Airbnb – Scaling 1000s of nodes in multicluster
●
Massive k8s adoption from Legacy – not greenfield; 1200 services
●
2.4K nodes at Airbnb now (Alibaba did a 10K nodes cluster)
●
EC2, Chef, Terraform, inhouse Kubegen – Convert airbnb config to k8s config
●
Etcd v3, not using KubeFed now. Kops, kubeadm, helm, Deploy < 10 min.
●
Smartstack servicemesh - Equivalent to various VPC CNIs (AWS, Lyft).
●
Service placement in random cluster; Up to 400 node cluster is usually used.
●
Now --> 22 cluster types; 36 clusters; 7000+ nodes
●
More here: https://www.youtube.com/watch?v=ay7NibpRAYU
12. 12
Ebay – Setup Search on k8s
●
Own search engine called Kasini. 1.4B+ listerners + 300K QPS/day
●
40% Data Center is for search purpose; Web , DB, Hadoop, AI
●
60+ production cluster, 2k+ node clusters – 160K+ pods, 30K+ hosts
●
Selected K8s for speed, scale, flexible, Automate
●
Matrics deployment Operator; Mutating Webhook; Multi cluster support;
●
Performance exploration in comparison with Baremetal – Kernel, CPU turbo boost, Networking ipvlan
●
More here: https://www.youtube.com/watch?v=chGN44Kqpd8
●
13. 13
Uber – Kubernetes Migration Journey
●
Multi region & Multi zone – Baremetal Mesos to k8s movement – needed sidecar kind of pod
●
15M+ trips per day - 65 countries/700 cities - 1K microservices - 10K instances - 100K service containers per cluster -
●
1M+ batch containers - 35+ clusters - 5K+ builds per day - Cluster larger than 5K nodes – Kafka, Elastic, SPIRE
●
Benchmarked: etcd 50K writes & 150K reads / sec & value size > 256 bytes - 40K pods in 8K nodes can in 30 sec.
●
Peleton custom scheduler from Uber as k8s plugin. 1m/1k containers launched per day/sec. Also share for Mesos.
●
Large volume of batch workload; stateless and batch on shared cluster; Distributed deep learning on GPU.
●
More here: https://www.youtube.com/watch?v=91c3iUI2K7M
14. 14
Lyft – Large Scale Stateful Workloads in k8s
●
Flyte – Custom orchestrator for data pipeline, Data science jobs, ETL, Backup, Ride Simulations,
●
Serverless, REST/gRPC, Multi tenant, Run on AWS & Google
●
Flyte worklfow is k8s custom resource, Several other CRDs like Spark;
●
1000s of containers started /min, 10M+ containers / month, High API server load ~90/min,
●
Use Resource Quota, Periodics GC of CRDs, reduce number of etcd writes,
●
Performance – discoverbale task & Node affinity; Cost optimization – QoS, Bube-batch scheduler,
●
Scaling beyond single cluster to meet SLO, Flyteadmin intelligently distributes workloads
●
More here: https://www.youtube.com/watch?v=ECeVQoble0g
15. 15
GrapeUp – Continous deployments to Car
●
Tried, KubeEdge - https://kubeedge.io/en/, k3s - https://k3s.io/ and then modified model.
●
Custom car controller - used digital twin patterns
●
Rsocket (byte stream transport), Custom docker ima ges
●
From Jenkin direct deployment to car using digital twin pattern
●
More here https://www.youtube.com/watch?v=zmuOxFp3CAk
16. 16
Planet Scale – DB Service on k8s
●
Planetscale CNDb – Cloud native database – built on top of Vitess & MySQL.
●
Journey - Inconsitent deployment to containers; stateful workload to stateless world
●
Vitess – a great management system for large one distributed system – mainly SQL – but challenge to configure
●
Wrote a Vitess Operator; etcd use this operator; Lots of autoprovisioning including Grafana plugin.
●
Planetscale cluster CRDs + lots of meta infra built on,
●
Prometheus, Grafana, Using proxy OpenResty instead of Nginx
●
Looking Multi cloud clusters – master in AWS and replica in GCP, BYOD k8s,
●
More here: https://www.youtube.com/watch?v=469NOldFOgw
17. 17
SalesForce – Enterprise Cloud
●
Private DC, BareMetal, Internal PKI with mTLS, OPA, RBAC
●
Each tenant has namesapce, Internal secret management system
●
Container image scanning for forensic
●
Jsonnet in Git, Operator CRD, Spinnaker template, helm charts
●
Kubernetes history visualization tool – Sloop. Its opensouce!
●
TestBed to Canary to production – deployment model
●
More here: https://www.youtube.com/watch?v=M5H4SrUM5BU
18. 18
Goldman Sachs – K8s Policy & OPA implementation
●
12 clusters + Running on VM + 150 namespace per cluster
●
Prometheus, Grafana, Ceph, Rook, CoreDNS, OPA
●
Tenant at namespace level, Group Roles, RBAC, Quotas, NFSShares, Ngnix
●
OPA controls --> Prohibit changes Admission Control & Provisioning with Resources
●
24 rules/namespace, culster state fix 5 min; Weekly maintenanceOffload all decisions to
OPA - any env changes that will be handled.
●
5 min turnaround for global application policy implementation (version controlled)
●
More here: https://www.youtube.com/watch?v=lYHr_UaHsYQ
19. 19
Fidelity – Finance grade K8s with GitOps
●
Hightly Regulated industry – Policy & Security
●
FIDEKS – Custom Augmented k8s platform, Helm, Flux CD deply workload,
●
Rollout of updates using GitOps – standard workflow with git repo.
●
AWS, EKSManager, EKSctl, EKS Connect,
●
Flux Helm operator, AD group, Jenkin, Cucumber,
●
More here: https://www.youtube.com/watch?v=9xIG4lze7Uo
20. 20
Freddie Mac – Istio Journey Brownfield to Greenfield
●
Istio Journey
●
•
600+ Application, Legacy apps, CI/CD pipelines, GitOps
•
VMWare, Jave, SQL, NoSQL, HW loadbalancer initially
•
Service side car mix and match, PKI, HA Autoscaling, traffic flow control
•
Istio – zero trust, DNS aware, m-TLS, Security as code, Cloud LBs,
•
Centralized compliance, Locality aware multi AZ k8s, Istio based not HWLB
•
Not ORG CA but intermediate CA and put in FIPS compliant HW not in memory
•
More here: https://www.youtube.com/watch?v=Rako7zKXquU
21. 21
Govt of Ottawa – Moving Legacy to Cloud
●
Support federal government workers, their concerns, etc.
●
Need to Migrate old linux servers - 17K+ employees - 120+ business lines - 400+ apps (Java, .NET, perl)
●
GitOps + FluxCD + Smart templates - Azure App servuce and VMs are still in use
●
Looking forwad – Corporate container security standards; cloud governance; Automation tooling
●
More here: https://www.youtube.com/watch?v=oBuOf-IvHWQ
22. 22
MoD Israel – AI in k8s production
●
Self Service Cloud experience for data scientists
●
Multi tenancy with Openshift + AutoML setup + Ceph, PostgreSQL, JupyterHub, RabitMQ
●
Working with several ML communities
●
Open Data Hub – Reference Architecture for ML Service – Deploy several components using
the Open data Hub operator
●
CI/CD with production for AI workloads achieved
●
More here: https://www.youtube.com/watch?v=LnXlZN8J6w0
23. 23
DoD US – Moved to k8s & Istio
●
Lots of silos in DoD.
●
DoD DevSecOps is open source now, Centralized artifactory repo, zero trust security,
●
Knative, OPA, EFK,
●
STIG Complaince & OpenSCAP, Twistlock, Anchore,
●
K8s is adopted in figher planes and running smooth!!!
●
More here: https://www.youtube.com/watch?v=YjZ4AZ7hRM0
24. 24
If you are looking for Latest Open source News Weekly,
Click here:
https://github.com/krishna-mk/Top-10-OpenSource-News-Weekly