Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kubinception: using Kubernetes to run Kubernetes

466 views

Published on

More information and detailed schemas > http://bit.ly/2KjOSi7 (unclickable link - please copy and paste)

One of the most structural choices we made while building OVH Managed Kubernetes service was to deploy our customers’ clusters over our own ones. Kubinception indeed…
In this post we are relating our experience running Kubernetes over Kubernetes, with hundreds of customers’ clusters. Why did we choose this architecture? What are the main stakes with such a design? What problems did we encounter? How did we deal with those issues? And, even more important, if we had to take the decision today, would we choose again to do the Kubinception?

Published in: Technology
  • Be the first to comment

Kubinception: using Kubernetes to run Kubernetes

  1. 1. CNCF BordeauxKubinception Kubinception
  2. 2. CNCF BordeauxKubinception Kevin GEORGES @0xd33d33 Passionate about distributed systems Love to share experience and to contribute to tech communities Flutter
  3. 3. CNCF BordeauxKubinception Pierre PERONNET Flutter DevOps @OVH Automation adept. Do not craft it, code it.
  4. 4. CNCF BordeauxKubinception Sebastien JARDIN Sebastien_Jard1 SRE Engineer Tech lovers Flutter
  5. 5. CNCF BordeauxKubinception
  6. 6. CNCF BordeauxKubinception OVH: Our solutions Cloud Web Hosting ▪ Dedicated Server ▪ Data Storage ▪ Network and Security ▪ Licences Mobile Hosting Telecom VoIP SMS/Fax Virtual desktop Cloud HubiC Over theBox Containers Compute Database Object Storage Securities Messaging VPS Public Cloud Private Cloud Serveur dédié Cloud Desktop Hybrid Cloud Domain names Email CDN Web hosting MS Office MS solutions
  7. 7. CNCF BordeauxKubinception OVH: Platform Metrics Platform Logs Platform Managed Kubernetes
  8. 8. CNCF BordeauxKubinception OVH: Managed Kubernetes Launched at OVH Summit End 2018
  9. 9. CNCF BordeauxKubinception OVH: Managed Kubernetes + 2000 clusters + 10k pods running + 1500 cpu cores + 22 Tb memory
  10. 10. CNCF BordeauxKubinception Kubernetes Way more than a buzzword!
  11. 11. CNCF BordeauxKubinception Masters and nodes
  12. 12. CNCF BordeauxKubinception Some more details
  13. 13. CNCF BordeauxKubinception Key components ● ETCD ● ApiServer ● Kube proxy
  14. 14. CNCF BordeauxKubinception Managed Kubernetes Don't try it at home, folks!
  15. 15. CNCF BordeauxKubinception Kubinception: running K8s on K8s Using Kubernetes to run Kubernetes
  16. 16. CNCF BordeauxKubinception Kubinception: where are the nodes?
  17. 17. CNCF BordeauxKubinception Kubinception with several customers
  18. 18. CNCF BordeauxKubinception Kubernetes communication
  19. 19. CNCF BordeauxKubinception Node port Kubinception client side networking Incoming connections ● Access thanks to the node IP (or with Round-Robin DNS) ● Routed with port number ● Sensitive to node failures
  20. 20. CNCF BordeauxKubinception Kubinception client side networking K8S LB Incoming connections ● Based on OVH IPLB product ● Routed by destination IP ● HA by design ● Highly scalable
  21. 21. CNCF BordeauxKubinception Kubinception client side networking API Server connections External → API Server ● OVH IPLB ● Node Port ● Ingress nginx + SNI routing ● Full TCP connexion no SSL termination before API Server OVH IPLBNode Port Adminnode Ingress Nginx kubectl com m ands
  22. 22. CNCF BordeauxKubinception Kubinception client side networking API Server connections Node initiate a TCP tunnel ● called WormHole from Kubernikus project Connections from Kube components are routed through the tunnel N ot accessible from clients nodes due to private network
  23. 23. CNCF BordeauxKubinception The long road to prod A journey not for the faint of heart
  24. 24. CNCF BordeauxKubinception ETCD
  25. 25. CNCF BordeauxKubinception ETCD as a pod?
  26. 26. CNCF BordeauxKubinception ETCD as a pod? ● Easy to bootstrap ● Hard to maintain thousands of them ● Use local, non persistent storage as default ● Too much risk of total quorum loss using persistent volumes
  27. 27. CNCF BordeauxKubinception ETCD with an operator?
  28. 28. CNCF BordeauxKubinception ETCD with an operator? ● Built by the community ● Handle etcd lifecycle: ○ creation ○ destruction ○ resizing ○ failover ○ rolling upgrades, backups… ● Use local, non persistent storage as default ● Too much risk of total quorum loss using persistent volumes
  29. 29. CNCF BordeauxKubinception ETCD with stateful sets?
  30. 30. CNCF BordeauxKubinception ETCD with stateful sets? ● Can handle a full cluster disruption ● No lifecycle management ● Resource cost ● Requires distant persistent volumes ● Performance issues
  31. 31. CNCF BordeauxKubinception ETCD multi-tenant? Deployed as an multi-tenant etcd cluster on dedicated servers
  32. 32. CNCF BordeauxKubinception ETCD multi-tenant? ● Dedicated hardware ● Can handle a full cluster disruption ● Easy to manage ● High perf
  33. 33. CNCF BordeauxKubinception Network
  34. 34. CNCF BordeauxKubinception Network
  35. 35. CNCF BordeauxKubinception Network CNI? Calico Cannal Flannel Cilium Contiv Romana
  36. 36. CNCF BordeauxKubinception Network CVE ● 2018-03-28 - CVE-2019-9946 ● 2018-11-13 - TTA-2018-001 Young ● Many implementations ● Need something battle tested
  37. 37. CNCF BordeauxKubinception Network No CNI :)
  38. 38. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod
  39. 39. CNCF BordeauxKubinception Network Why BREX?
  40. 40. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod
  41. 41. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.0.0.1 ?
  42. 42. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.0.0.1 ?
  43. 43. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.0.0.1 ?
  44. 44. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.0.0.1 ?
  45. 45. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.0.0.1 is-at 00:00:00:00:00:01
  46. 46. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.0.0.1 is-at 00:00:00:00:00:01 10.0.0.1 00:00:00:00:00:01
  47. 47. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.0.0.1 is-at 00:00:00:00:00:01 10.0.0.1 00:00:00:00:00:01
  48. 48. CNCF BordeauxKubinception Network - First L2 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.0.0.1 is-at 00:00:00:00:00:01 10.0.0.1 00:00:00:00:00:01
  49. 49. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod SYN 10.0.0.1 10.0.0.1 00:00:00:00:00:01
  50. 50. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod SYN 10.0.0.1 10.0.0.1 00:00:00:00:00:01 net.ipv4.ip_forward=1
  51. 51. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod SYN 10.0.0.1 10.0.0.1 00:00:00:00:00:01 net.ipv4.ip_forward=1
  52. 52. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod SYN 10.0.0.1 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42 net.ipv4.ip_forward=1
  53. 53. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.100.0.2 ? 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  54. 54. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.100.0.2 ? 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  55. 55. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.100.0.2 ? 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  56. 56. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.100.0.2 ? 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  57. 57. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.100.0.2 is-at 00:00:00:00:00:02 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  58. 58. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.100.0.2 is-at 00:00:00:00:00:02 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  59. 59. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.100.0.2 is-at 00:00:00:00:00:02 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  60. 60. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.100.0.2 is-at 00:00:00:00:00:02 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  61. 61. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod SYN/ACK 10.100.0.2 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42 10.100.0.2 00:00:00:00:00:02
  62. 62. CNCF BordeauxKubinception Network - Then L3 CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod SYN/ACK 10.100.0.2 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42
  63. 63. CNCF BordeauxKubinception Network How to fix the storm?
  64. 64. CNCF BordeauxKubinception Network - Proxy ARP CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod A R P
  65. 65. CNCF BordeauxKubinception Network - Proxy ARP CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.100.0.2 ? 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42 A R P
  66. 66. CNCF BordeauxKubinception Network - Proxy ARP CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod who has 10.100.0.2 ? 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42 A R P
  67. 67. CNCF BordeauxKubinception Network - Proxy ARP CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.100.0.2 is-at 00:00:00:00:21:42 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42 A R P
  68. 68. CNCF BordeauxKubinception Network - Proxy ARP CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod 10.100.0.2 is-at 00:00:00:00:21:42 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42 A R P
  69. 69. CNCF BordeauxKubinception Network - Proxy ARP CBR0BREX 00:00:00:00:21:42 10.10.0.1/32 00:00:00:00:25:45 10.100.0.1/24 00:00:00:00:00:02 10.100.0.2/32 00:00:00:00:00:01 10.0.0.1/32 NodeETCD Pod SYN/ACK 10.100.0.2 10.0.0.1 00:00:00:00:00:01 10.100.0.2 00:00:00:00:21:42 10.100.0.2 00:00:00:00:21:42 A R P
  70. 70. CNCF BordeauxKubinception KubeProxy
  71. 71. CNCF BordeauxKubinception KubeProxy 3 modes: ● Userspace ● IPTables ● IPVS
  72. 72. CNCF BordeauxKubinception KubeProxy - Userspace Deprecated :(
  73. 73. CNCF BordeauxKubinception KubeProxy - IPTables ● Chained Process / Not incremental ● Locked during update ● Time spent to add 1 rule when svc count increase ● For 20k svc (160k rules) : 5 hours!
  74. 74. CNCF BordeauxKubinception KubeProxy - IPTables Routing performances 1 service 1k service 10k services 50k services First Service 575μs 614μs 1023μs 1821μs Middle Service 575μs 602μs 1048μs 4174μs Last Service 575μs 631μs 1050μs 7077μs
  75. 75. CNCF BordeauxKubinception KubeProxy - IPVS ● Hashing vs Chains ● Better Load Balancing algorithms ● Weighted / RR / LeastConn / src&dst hashing / … ● Health Checks / Connections retries… ● IPTables is a swiss knife where IPVS is a purposed one
  76. 76. CNCF BordeauxKubinception Reload performances KubeProxy - IPVS vs IPTables # Services 1 5 000 20 000 # Rules 8 40 000 160 000 IPTables 2 ms 11 min 5 hours IPVS 2 ms 2 ms 2 ms
  77. 77. CNCF BordeauxKubinception Bandwidth performances KubeProxy - IPVS vs IPTables #service 1 1k 5k 10k 25k 50k First First Last First Last First Last First Last First Last IPTables (MB/s) 67 64 56 50 39 15 6 0 0 0 0 IPVS (MB/s) 65 62 54 54 54 43 43 30 29 24 24
  78. 78. CNCF BordeauxKubinception The security journey
  79. 79. CNCF BordeauxKubinception We are hiring! ● Opensource database expert ● Site Reliability Engineers (Private cloud, Openstack, DNS, Observability) ● Software engineers (containers, baremetal, webhosting) ● Back-end developers (go, python) ● Engineering manager webhosting
  80. 80. CNCF BordeauxKubinception Kubinception Thank you!

×