Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rancher 2.0 Technical Deep Dive

2,766 views

Published on

西脇 雄基(LINE)/Rancher 2.0 Technical Deep Dive
2018/7/28 LINE Developer Meetup in Tokyo #40 -Kubernetes-
https://line.connpass.com/event/92049/

Published in: Technology
  • Be the first to comment

Rancher 2.0 Technical Deep Dive

  1. 1. Rancher 2.0 Technical Deep Dive LINE Corporation IT Service Center Verda 2 Yuki Nishiwaki
  2. 2. Who Are You: Yuki Nishiwaki ● Develop/Operate OpenStack based Private Cloud ● Plan/Develop/Operate Kubernetes as a Service ● Excitingly simple multi-path OpenStack Networking (May 2018, OpenStack Summit) Recent Talk Current Role
  3. 3. Brief introduction of ourselves(Verda)
  4. 4. The position against k8s from me 。。。。 。。。。 。 。。 。 。 。。 。 OpenStack based Private Cloud Operator/Developer for Private Cloud Server Network Database ….. Me New Resource Type
  5. 5. Verda Kubernetes as a Service - Background ● We’ve seen about 600 k8s node users deployed/used on our Private Cloud ● Many teams find easy way/struggle to use/operate k8s Problem description ● Operating k8s is not such a small burden every developer can handle in spare time ● Knowledge of k8s operation is fragmented New Resource Type For Verda User
  6. 6. Verda Kubernetes as a Service - Target ● Provide stable Kubernetes Cluster to Verd User ○ Don’t have to automate everything but we will take responsibility to operate ● Provide API to Verda User ○ CREATE/DELETE Kubernetes Cluster (UPGRADE is not the target at the moment) ○ ADD/REMOVE Node ● Provide “Service Desk” ○ To advise/consider how to use with Verda User(Application developer) New Resource Type For Verda User
  7. 7. Verda Kubernetes as a Service - Status of Project ● Start Project since May 2018 ○ Pretty late, relatively ● Decided to utilize existing software(OSS) ○ Reduce lead time as much as possible ○ Rancher 2.0 is one of the candidates we will use ■ We are thinking to use Rancher 2.0 for Phase 1 ● Still deciding which is good to use for managing k8s part ○ Or Will we have to develop from scratch?
  8. 8. Less dependent design - Still considering Verda Kubernetes as a Service Provider Plugin Rancher API Our own API schema ???? Cluster Node We use Rancher as tool to provide * Create k8s * Monitor k8s * Update k8s * Add Node to k8s cluster * Remove Node from k8s cluster ????
  9. 9. Roadmap Phase1 (2018/09/01) * This is First release * No change for Kubernetes/Rancher * Support only basic k8s cluster * Support Limited Number of Cluster * Train ourselves * For Rancher (because we depended) Phase3 (Planning) * Enhance k8s support * CRD/Controller for in-house Component * Prepare skeleton template to make it easy to start development * Consider solution about how k8s cover whole system including VM (kubevirt) Phase 1 Phase2 (Planning) * Enhance VKS control plane(Tune Rancher) * Support More Clusters * Enhance monitoring item * Enhance GUI * Enhance k8s support * Support Type Loadbalancer for in-house LB * Support Persistent Volume * Optimizing Container Networking * Train ourselves * For Kubernetes, Etcd Phase 3Phase 2
  10. 10. About Rancher 2.0
  11. 11. What’s Rancher? ● Container Management Tool ● Support to deploy Container Orchestration Tool itself like Kubernetes ● Make “Container Orchestration itself” abstract and Provide rich UI ● UI allow you to deploy your container workload easier than native console ● UI allow you to use well-tested catalog
  12. 12. Rancher 2.0 Released (May 1 2018) ● Focus on using Kubernetes as a Container Orchestration Platform ● Re-design to work on Kubernetes from scratch ● Re-Implement from scratch ● Introduce Rancher Kubernetes Engine (RKE) ● Unified Cluster Management including GKE, EKS… as well as RKE ● Application Workload Management
  13. 13. Rancher 2.0 ● Focus on using Kubernetes as a Container Orchestration Platform ● Re-design to work on Kubernetes from scratch ● Re-Implement from scratch ● Introduce Rancher Kubernetes Engine (RKE) ● Unified Cluster Management including GKE, EKS… as well as RKE ● Application Workload Management Our Interest as a backend For our “k8s as a service“
  14. 14. Rancher 2.0 ● Focus on using Kubernetes as a Container Orchestration Platform ● Re-design to work on Kubernetes from scratch ○ Don’t have to understand multiple container orchestrators ● Re-Implement from scratch ○ Readable amount of code (about 50,000~80,000 lines except for vendoring) ● Introduce Rancher Kubernetes Engine (RKE) ○ Support Deploy/Upgrade/Monitor Kubernetes cluster ○ Less requirements for the environment to build k8s ● Unified Cluster Management including GKE, EKS… as well as RKE ● Application Workload Management Our Interest as a backend For our “k8s as a service“
  15. 15. As a context: backend for Verda K8s as a Service ● In our use case, User/Operator for Rancher is different ○ Operator: Cloud Operator (us) ○ User: Application Developers for LINE Service ● Down time of Rancher affect to many users Need to know well about How Rancher works
  16. 16. But Documentation….
  17. 17. Only High-level Architecture is available To be honest, It’s difficult to operate Rancher 2.0 with just official documents ¯_(ツ)_/¯
  18. 18. Let’s unbox Rancher 2.0 <v2.0.0> 本資料は、調査資料を一部抜粋して作成しています。 フルバージョンをご覧になりたい方は、後日slideshareでご確認をお願いします。
  19. 19. 1. Rancher Overview 1. Rancher Overview 1.1. Casts in Rancher 2.0 1.2. What Server does? 1.3. What Agent does? 1.4. Summary 2. Rancher Server Internal 2.1. Rancher API 2.2. Rancher Controllers 2.3. Example Controllers
  20. 20. 1.1. Casts in Rancher 2.0 rancher server Node1 Node2 Node3 rancher node-agent rancher node-agent rancher node-agent rancher cluster-agent➢ Rancher Server ➢ Rancher Cluster Agent ➢ Rancher Node Agent Parent Kubernetes Child Kubernetes deployed by Rancher Child Kubernetes deployed by Rancher Parent k8s: k8s working with rancher Child k8s: k8s deployed by rancher
  21. 21. 1.2. What Server does? Server API Controllers CRD Kind: Cluster Node1 Node2 Node3 rancher node-agent rancher node-agent rancher node-agent rancher cluster-agent Child Kubernetes deployed by Rancher Child Kubernetes deployed by Rancher CRD Kind: Node All data stored as CRD in k8s Watch CRD Deploy Monitor Cluster/Sync Data Call docker/k8s API via websocket, If need. Don’t access to docker/k8s api directly from rancher server Websocket session Point 2 Point 3 Point 4 Point 5 Point 1 Provide API
  22. 22. 1.2. What Server does? Server API Controllers CRD Kind: Cluster Node1 Node2 Node3 rancher node-agent rancher node-agent rancher node-agent rancher cluster-agent Child Kubernetes deployed by Rancher Child Kubernetes deployed by Rancher CRD Kind: Node Provide unified access to multiple k8s cluster Point 6
  23. 23. 1.3. What Agent does? Node Agent Node A Cluster Agent Child Kubernetes Node Agent Node B Parent Kubernetes Server Dialer API (pkg/dialer) RkeNodeConfig API (pkg/rkenodeconfigserver) Controllers websocket session (/v3/connect) /v3/connect/config Use session For access (k8s, docker) Rancher Agent basically establishes websocket to provide TCP Proxy and just checks NodeConfig periodically. Almost all configurations will be done/triggered by controllers through websocket Point 3 Establish websocket session Point 1 Provide TCP Proxy via websocket Point 2 Check If file,container need to create/run or not periodically
  24. 24. 1.4. Rancher 2.0 overview summary Almost all logics are in Rancher Server and Agent is just sitting as a TCP Proxy Server in k8s deployed for Rancher Server ● Rancher Server a. All data for Rancher stored as CRD in Kubernetes (translating Rancher’s resource into CRD) b. Rancher’s API is kind of proxy to Kubernetes API c. Rancher have various controllers to watch CRD resources in parent k8s to deploy k8s (Management Controllers) d. Rancher have various controllers to watch CRD resources in parent k8s to inject some data to k8s deployed (User Controllers) e. Use websocket session to access deployed Node or K8s Cluster. ● Rancher Agent a. Establish websocket to provide TCP Proxy b. Check periodically if node need to create something file or run something container Parent k8s: k8s working with rancher Child k8s: k8s deployed by rancher If we want to know more about How Rancher maintain Kubernetes Cluster, It’s enough to see just Rancher Server. Because Agent is just to provide proxy.
  25. 25. 2. Rancher Server Internal 1. Rancher Overview 1.1. Casts in Rancher 2.0 1.2. What Server does? 1.3. What Agent does? 1.4. Summary 2. Rancher Server Internal 2.1. Rancher API 2.2. Rancher Controllers 2.3. Example Controllers
  26. 26. 2.1. Rancher API Server API Controllers CRD Kind: Cluster CRD Kind: Node Node1 Node2 Node3 rancher node-agent rancher node-agent rancher node-agent rancher cluster-agent Child Kubernetes deployed by Rancher All data stored as CRD in k8s Point 2 Point 1 Provide API
  27. 27. 5 types of API Server Controllers API Parent Kubernetes ➢ API can be classified into 5 types ➢ Some API is for only Agent ○ API for user ■ Management ■ Auth ■ K8s Proxy ○ API for agent ■ Dialer ■ RKE Node Config Auth API Management API K8s Proxy API Dialer API RKE Node Config API Main /v3-public /v3/token /v3/ /k8s/clusters /v3/connect /v3/connect/register /v3/connect/config Agent User 2.1. Rancher API
  28. 28. Management API Server Controllers API Parent Kubernetes Auth API Management API K8s Proxy API Dialer API RKE Node Config API Main /v3/ Child Kubernetes deployed by Rancher Create/Update/Get Resource Create/Update/Get Resource POST /v3/cluster POST /v3/project/ <cluster-id>:<project-id>/pods CRD Cluster PodAgent depending on Path Use TCP Proxy Cluster Agent provide 2.1. Rancher API
  29. 29. K8s Proxy API Server API Parent Kubernetes Management API Dialer API RKE Node Config API Main Child Kubernetes deployed by Rancher CRD Token Auth API Authenticate with User CRD resource for Rancher API K8s Proxy API Controllers Websocket Sessions Agent Call Child K8s API via TCP Proxy via Websocket GET /k8s/clusters/<cluster> /api/v1/componentstatuses /k8s/clusters GET /api/v1/componentstatuses 2.1. Rancher API
  30. 30. Dialer API Server API Parent Kubernetes Management API RKE Node Config API Main Child Kubernetes deployed by Rancher Auth API K8s Proxy API Controllers Websocket Sessions Agent Dialer API /v3/connect /v3/connect/register wss://<rancher-server>/v3/connect CRD ClusterRegisterToken Start Provide TCP Proxy via websocket Check which cluster Does agent belong to Add websocket session for “K8s Proxy” and Controllers to use TCP Proxy 2.1. Rancher API
  31. 31. RKE Node Config API Server API Parent Kubernetes Management API Main Child Kubernetes deployed by Rancher Auth API K8s Proxy API Controllers Agent Dialer API RKE Node Config API/v3/connect/config CRD Cluster RKE library Check Config Generate NodeConfig According to NodeConfig - Create File - Create container via docker 2.1. Rancher API
  32. 32. 2.2. Rancher Controllers Server API Controllers CRD Kind: Cluster Node1 Node2 Node3 rancher node-agent rancher node-agent rancher node-agent rancher cluster-agent Child Kubernetes deployed by Rancher CRD Kind: Node Watch CRD Deploy Monitor Cluster/Sync Data Call docker/k8s API via websocket, If need. Don’t access to docker/k8s api directly from rancher server Websocket session Point 3 Point 4 Point 5
  33. 33. API Controllers Management Controllers Cluster(User) Controllers Workload Controllers 4 types of Controllers Server API Controllers Parent Kubernetes Create Resource Watch Resource ➢ Rancher Controllers can be classified into 4 types of group ➢ Each group have own trigger to start ➢ Triggered when Server start ○ API Controllers ○ Management Controllers ➢ Triggered when new Cluster detected ○ Cluster(User) Controllers ○ Workload Controllers 2.2. Rancher Controllers
  34. 34. API Controllers Management Controllers Cluster(User) Controllers Workload Controllers API Controllers Server API Controllers Parent Kubernetes Create Resource Watch Resource Configure ➢ Watch CRD resource related to API Server Configuration ○ settings ○ dynamicschemas ○ nodedrivers ➢ Configure API server according to the change of resource 2.2. Rancher Controllers
  35. 35. API Controllers Management Controllers Cluster(User) Controllers Workload Controllers Management Controllers Server API Controllers Parent Kubernetes Create Resource Watch Resource Provisioning/Update Cluster Start Cluster(User), Workload Controllers Child Kubernetes deployed by Rancher ➢ Watch Cluster/Node related CRD ➢ Provision/Update Cluster according to the change of resource ➢ After Provision, Start Cluster(User), Workload Controllers to start data sync and monitor 2.2. Rancher Controllers
  36. 36. API Controllers Management Controllers Cluster(User) Controllers Workload Controllers Cluster(User) Controllers Child Kubernetes deployed by Rancher Server Controllers Parent Kubernetes Create Resource Watch ResourceCreate Resource Watch Resource Update/Create CRD According to Child K8s Update/Create Resource including Pod According to Parent K8s CRD 36 Cluster CRD Secret Alerts CRD Status Spec Node For updating CRD in Parent K8s Resource Sync between Parent and Child K8s 2.2. Rancher Controllers
  37. 37. API Controllers Management Controllers Cluster(User) Controllers Workload Controllers Workload Controllers Child Kubernetes deployed by Rancher Server API Controllers Parent Kubernetes Create Resource Watch ResourceCreate Resource Watch Resource The Simple Custom Controller to extend K8s ➢ Watch only resource@Child K8s ➢ Create/Update additional resource ➢ These Controller are more like enhancing K8s feature itself 2.2. Rancher Controllers
  38. 38. 2.3. Examples: Controllers Management Controllers ➢ Cluster Controller ➢ Node Controller Cluster(User) Controllers ➢ ClusterLogging Controller
  39. 39. Cluster Controller Implement (pkg/controllers/management) Parent k8s Server Cluster Controller (one of management controllers) handlers lifecycles cluster-provisioner-controller cluster-agent-controller cluster-scoped-gc cluster-deploy cluster-stats CRD Cluster A Informer Child k8s Cluster A watch Execute deploy Node Agent Cluster Agent deploy Cluster(User) Controllers Alerts ingress ... Run Cluster Controllers for Cluster A CRD Node A CRD Node B Update Cluster Collect status 2.3. Example Controllers
  40. 40. Node Controller Implement (pkg/controllers/management) Parent k8s CRD Node A CRD Node B Server Node Controller (one of management controllers) handlers lifecycles node-controller cluster-provisioner-controller cluster-stats nodepool-provisioner Informer watch Execute VM Node Agent Managements Controllers Cluster Controller NodePool Controller Just trigger handlers Run Node Agent Create VM If doesn’t exist Call wss://<server>/v3/connect/register To register node into specific cluster docker-machine trigger handlers Create VM 2.3. Example Controllers
  41. 41. Cluster(Project)Logging Controller Implement (pkg/controller/user/logging/) Parent k8s CRD ClusterLogging Server Child k8s ClusterLogging Controller lifecycle cluster-logging-controllerInformer Execute ProjectLogging Controller Almost same as ClusterLogging Controller Watch Daemonset cluster.conf ConfigMap project.conf ConfigMap /var/lib/docker/containers/ /var/log/containers/ /var/log/pods /var/lib/rancher/rke/log HostPath Mount Mount Mount Deploy Update Out of Scope Send logs 2.3. Example Controllers
  42. 42. How I look Rancher 2.0 In the context of backend for Verda k8s as a Service
  43. 43. Good thing: we are thinking to utilize ● Less requirement for environment to run ○ but this cause some scalability limitation at the same time though... ● There are some interesting Controllers like alert, logging, eventsync … ○ We can utilize these feature to manage K8s Cluster ● Easy to modify/add Rancher behaviour thanks to Norman Framework ○ We will utilize this framework to extend even for k8s
  44. 44. Not good thing: we are thinking to improve ● Poor Document (Currently reading code is only way to know) ○ Norman Framework that Rancher actively used is also less document ● Doesn’t support Active-Active HA ○ Scalability limitation cannot be avoided ● 1 binary that have ton of features make it difficult to do performance tuning ● Even for K8s Proxy API, we can not deploy multiple process because that feature depend on websocket session to cluster-agent ● Poor monitoring relying on kubelet and componentstatus ● Upgrading Strategy is just to replace old container with new one. Is it enough? https://github.com/rancher/rke/blob/master/services/kubeapi.go#L15 , https://github.com/rancher/rke/blob/master/docker/docker.go#L72
  45. 45. Future Work Verda Kubernetes as a Service
  46. 46. Future Works ● Use Rancher 2.0 as a backend to manage k8s without any change at Phase 1 ○ Modify Rancher and Give feedback to community in the long run after Phase 1 release ■ Enhance scalability ■ Enhance monitoring ■ Cut(or Disable) many unneeded features to us ● Enrich Kubernetes deployed by RKE ○ Support Type Loadbalancer for our XDP based Loadbalancer ○ Support Persistent Volume ○ Add CRD/Controller to support our In-house Component like Kafka, Database as a Service ■ We want Kubernetes to be orchestration tool for System not for Container ● Need more k8s/etcd itself knowledge ○ !!!!Read Code!!!! Not only just books/documents!! ■ Kubernetes ■ Etcd
  47. 47. We are hiring people!!! ● Love to understand/customize OSS at source code level ○ Kubernetes ○ Etcd ○ OpenStack ○ Rancher ○ Ceph... https://linecorp.com/ja/career/position/827 https://linecorp.com/ja/career/position/564
  48. 48. Appendix I straighten my understandings as a diagram. It’s available in (https://github.com/ukinau/rancher-analyse)
  49. 49. VKS-API Server VKS-API Server K8s Proxy K8s Proxy XXX API After start service, see the performance and consider to separate/scaleWithout touching anything If we can not scale Rancher Server anymore, we will add one more cluster. Phase 1 Phase 2 Rancher Scalability Improvement Scheduling Other Datastore Use other datastore for some data Extra Monitoring Enhance Monitoring Point 2 Point 1 Point 3 Point 4Appendix
  50. 50. Support Type Loadbalancer for in-house LB We have our own LB implementation for scaling Deploy/Configure Deploy Service with Type Loadbalancer for our in-house LB Type Loadbalancer In-house LB *1 *1 https://www.janog.gr.jp/meeting/janog40/application/files/6115/0105/4928/janog40_sp6lb.pdf Appendix
  51. 51. Be friend with In-house Components We provide many type of managed as service Deploy/Configure Deploy Application with information for managed service Update IP ACL…. And so on User configure managed service separately from application lifecycle Appendix
  52. 52. Be friend with In-house Components Deploy/Configure Deploy Application with CRD CRD for In-house Component Custom Controller User can configure managed service with application lifecycle Appendix

×