Cluster management with Kubernetes

3,050 views

Published on

On Friday 5 June 2015 I gave a talk called Cluster Management with Kubernetes to a general audience at the University of Edinburgh. The talk includes an example of a music store system with a Kibana front end UI and an Elasticsearch based back end which helps to make concrete concepts like pods, replication controllers and services.

Published in: Technology

Cluster management with Kubernetes

  1. 1. Cluster Management with Kubernetes Please open the gears tab below for the speaker notes Satnam Singh satnam@google.com Work of the Google Kubernetes team and many open source contributors University of Edinburgh, 5 June 2015
  2. 2. The promise of cloud computing
  3. 3. Cloud software deployment is soul destroying Typically a cloud cluster node is a VM running a specific version of Linux. User applications comprise components each of which may have different and conflicting requirements from libraries, runtimes and kernel features. Applications are coupled to the version of the host operating system: bad. Evolution of the application components is coupled to (and in tension with) the evolution of the host operating system: bad. Also need to deal with node failures, spinning up and turning down replicas to deal with varying load, updating components with disruption … You thought you were a programmer but you are now a sys-admin.
  4. 4. Docker Source: Google Trends
  5. 5. What is Docker? An implementation of the container idea A package format Resource isolation An ecosystem
  6. 6. Virtual Machines workloads? We need to isolate the application components from the host environment.
  7. 7. VM vs. Docker
  8. 8. Docker “build once, run anywhere”
  9. 9. Resource isolation Implemented by a number of Linux APIs: • cgroups: Restrict resources a process can consume • CPU, memory, disk IO, ... • namespaces: Change a process’s view of the system • Network interfaces, PIDs, users, mounts, ... • capabilities: Limits what a user can do • mount, kill, chown, ... • chroots: Determines what parts of the filesystem a user can see
  10. 10. We need more than just packing and isolation Scheduling: Where should my containers run? Lifecycle and health: Keep my containers running despite failures Discovery: Where are my containers now? Monitoring: What’s happening with my containers? Auth{n,z}: Control who can do things to my containers Aggregates: Compose sets of containers into jobs Scaling: Making jobs bigger or smaller ...
  11. 11. Google confidential │ Do not distribute Everything at Google runs in containers: • Gmail, Web Search, Maps, ... • MapReduce, MillWheel, Pregel, ... • Colossus, BigTable, Spanner, ... • Even Google’s Cloud Computing product GCE itself: VMs run in containers
  12. 12. Google confidential │ Do not distribute Open Source Containers: Kubernetes Greek for “Helmsman”; also the root of the word “Governor” and “cybernetic” • Container orchestrator • Builds on Docker containers • also supporting other container technologies • Multiple cloud and bare-metal environments • Supports existing OSS apps • cannot require apps becoming cloud-native • Inspired and informed by Google’s experiences and internal systems • 100% Open source, written in Go Let users manage applications, not machines
  13. 13. Primary concepts Container: A sealed application package (Docker) Pod: A small group of tightly coupled Containers Labels: Identifying metadata attached to objects Selector: A query against labels, producing a set result Controller: A reconciliation loop that drives current state towards desired state Service: A set of pods that work together
  14. 14. Application Containers Homogenous Machine Fleet (Virtual or Physical) Kubernetes API: Unified Compute Substrate
  15. 15. Kubernetes Architecture etcd API Server Scheduler Controller Manager Kubelet Service Proxy kubectl, ajax, etc
  16. 16. Modularity Loose coupling is a goal everywhere • simpler • composable • extensible Code-level plugins where possible Multi-process where possible Isolate risk by interchangeable parts Example: ReplicationController Example: Scheduler
  17. 17. Reconciliation between declared and actual state
  18. 18. Control loops Drive current state -> desired state Act independently APIs - no shortcuts or back doors Observed state is truth Recurring pattern in the system Example: ReplicationController observe diff act
  19. 19. Atomic storage Backing store for all master state Hidden behind an abstract interface Stateless means scalable Watchable • this is a fundamental primitive • don’t poll, watch Using CoreOS etcd
  20. 20. Pods: Grouping containers Container Foo Namespaces - Net - IPC - .. Container Bar
  21. 21. Pods: Networking Container Foo Container Bar Namespaces - Net - IPC - ..
  22. 22. Pods: Volumes Container Foo Container Bar Namespaces - Net - IPC - ..
  23. 23. Pods: Labels Container Foo Container Bar Namespaces - Net - IPC - ..
  24. 24. Google confidential │ Do not distribute User owned Admin owned Persistent Volumes A higher-level abstraction - insulation from any one cloud environment Admin provisions them, users claim them Independent lifetime and fate Can be handed-off between pods and lives until user is done with it Dynamically “scheduled” and managed, like nodes and pods Pod ClaimRef PVClaim PersistentVolume GCE PD AWS ELB NFSiSCSI
  25. 25. Labels Arbitrary metadata Attached to any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism Use to determine which objects to apply an operation to • pods under a ReplicationController • pods in a Service • capabilities of a node (scheduling constraints) App: Nifty Phase: Dev Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: FE App: Nifty Phase: Test Role: BE
  26. 26. Selectors App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE
  27. 27. App == NiftyApp: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  28. 28. App == Nifty Role == FE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  29. 29. App == Nifty Role == BE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  30. 30. App == Nifty Phase == Dev App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  31. 31. App == Nifty Phase == Test App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors
  32. 32. Pod lifecycle Once scheduled to a node, pods do not move • restart policy means restart in-place Pods can be observed pending, running, succeeded, or failed • failed is really the end - no more restarts • no complex state machine logic Pods are not rescheduled by the scheduler or apiserver • even if a node dies • controllers are responsible for this • keeps the scheduler simple Apps should consider these rules • Services hide this • Makes pod-to-pod communication more formal
  33. 33. Replication Controllers production backend production backendproduction backend #N
  34. 34. Replication Controllers A type of controller (control loop) Ensure N copies of a pod always running • if too few, start new ones • if too many, kill some • group == selector Cleanly layered on top of the core • all access is by public APIs Replicated pods are fungible • No implied ordinality or identity Other kinds of controllers coming • e.g. job controller for batch Replication Controller - Name = “nifty-rc” - Selector = {“App”: “Nifty”} - PodTemplate = { ... } - NumReplicas = 4 API Server How many? 3 Start 1 more OK How many? 4
  35. 35. Services production backend production backend production backend port(s) name 1.2.3.4 “name”
  36. 36. Services 10.0.0.1 : 9376 Client kube-proxy Service - Name = “nifty-svc” - Selector = {“App”: “Nifty”} - Port = 9376 - ContainerPort = 8080 Portal IP is assigned iptables DNAT TCP / UDP apiserver watch 10.240.2.2 : 808010.240.1.1 : 8080 10.240.3.3 : 8080 TCP / UDP
  37. 37. A Kubernetes cluster on Google Compute Engine
  38. 38. A Kubernetes cluster on Google Compute Engine
  39. 39. A fresh Kubernetes cluster
  40. 40. Node 0f64: logging
  41. 41. Node 02ej: logging, monitoring
  42. 42. Node pk22: logging, DNS
  43. 43. Node 27gf: logging
  44. 44. A counter pod apiVersion: v1 kind: Pod metadata: name: counter namespace: demo spec: containers: - name: count image: ubuntu:14.04 args: [bash, -c, 'for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 1; done']
  45. 45. A counter pod $ kubectl create -f counter-pod.yaml --namespace=demo pods/counter $ kubectl get pods NAME READY REASON RESTARTS AGE fluentd-cloud-logging-kubernetes-minion-1xe3 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-p6cu 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-s2dl 1/1 Running 0 5m fluentd-cloud-logging-kubernetes-minion-ypau 1/1 Running 0 5m kube-dns-v3-55k7n 3/3 Running 0 6m monitoring-heapster-v1-55ix9 0/1 Running 12 6m
  46. 46. Node 27gf: logging, counter
  47. 47. Observing the output of the counter $ kubectl logs counter --namespace=demo 0: Tue Jun 2 21:37:31 UTC 2015 1: Tue Jun 2 21:37:32 UTC 2015 2: Tue Jun 2 21:37:33 UTC 2015 3: Tue Jun 2 21:37:34 UTC 2015 4: Tue Jun 2 21:37:35 UTC 2015 5: Tue Jun 2 21:37:36 UTC 2015 ...
  48. 48. ssh onto node and “ps” # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 532247036a78 ubuntu:14.04 ""bash -c 'i=0; whi About a minute ago Up About a minute k8s_count.dca54bea_counter_demo_479b8894-0971-11e5-a784-42010af00df1_f6159d40 8cd07658287d gcr.io/google_containers/pause:0.8.0 "/pause" About a minute ago Up About a minute k8s_POD.e4cc795_counter_demo_479b8894-0971-11e5-a784-42010af00df1_7de2fec0 b2dc87db6608 gcr.io/google_containers/fluentd-gcp:1.6 ""/bin/sh -c '/usr/ 16 minutes ago Up 16 minutes k8s_fluentd-cloud-logging.463ca0af_fluentd-cloud-logging-kubernetes-minion- 27gf_default_4ab77985c0cb4f28a020d3b097af9654_3e908886 c5d8641d884d gcr.io/google_containers/pause:0.8.0 "/pause" 16 minutes ago Up 16 minutes k8s_POD.e4cc795_fluentd-cloud-logging-kubernetes-minion-27gf_default_4ab77985c0cb4f28a020d3b097af9654_2b980b91
  49. 49. Example: Music DB + UI http://music-db:9200 http://music-ui:5601 music-db music-db music-db music-db music-ui
  50. 50. Example: Elasticsearch + Kibana Music DB & UI apiVersion: v1 kind: ReplicationController metadata: labels: app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: - name: es image: kubernetes/elasticsearch:1.0 env: - name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300
  51. 51. Music DB Replication Controller apiVersion: v1 kind: ReplicationController metadata: labels: app: music-db name: music-db spec: replicas: 4 selector: app: music-db template: metadata: labels: app: music-db spec: containers: ...
  52. 52. Music DB container containers: - name: es image: kubernetes/elasticsearch:1.0 env: - name: "CLUSTER_NAME" value: "mytunes-db" - name: "SELECTOR" value: "name=music-db" - name: "NAMESPACE" value: "mytunes" ports: - name: es containerPort: 9200 - name: es-transport containerPort: 9300
  53. 53. Music DB Service apiVersion: v1 kind: Service metadata: app: music-db labels: app: music-db spec: selector: app: music-db ports: - name: db port: 9200 targetPort: es
  54. 54. Music DB http://music-db:9200 music-db music-db music-db music-db
  55. 55. Music DB Query
  56. 56. Music UI Pod apiVersion: v1 kind: Pod metadata: name: music-ui labels: app: music-ui spec: containers: - name: kibana image: kubernetes/kibana:1.0 env: - name: "ELASTICSEARCH_URL" value: "http://music-db:9200" ports: - name: kibana containerPort: 5601
  57. 57. Music UI Service apiVersion: v1 kind: Service metadata: name: music-ui labels: app: music-ui spec: selector: app: music-ui ports: - name: kibana port: 5601 targetPort: kibana type: LoadBalancer
  58. 58. Music DB + UI http://music-db:9200 http://music-ui:5601 music-db music-db music-db music-db music-ui http://104.197.86.235:5601
  59. 59. Music UI Query
  60. 60. Scale DB and UI independently music-db music-db music-db music-ui music-ui
  61. 61. Monitoring Optional add-on to Kubernetes clusters Run cAdvisor as a pod on each node • gather stats from all containers • export via REST Run Heapster as a pod in the cluster • just another pod, no special access • aggregate stats Run Influx and Grafana in the cluster • more pods • alternately: store in Google Cloud Monitoring
  62. 62. Logging Optional add-on to Kubernetes clusters Run fluentd as a pod on each node • gather logs from all containers • export to elasticsearch Run Elasticsearch as a pod in the cluster • just another pod, no special access • aggregate logs Run Kibana in the cluster • yet another pod • alternately: store in Google Cloud Logging
  63. 63. Example: Rolling Upgrade with Labels Servers: Labels: backend v1.2 backend v1.2 backend v1.2 backend v1.2 backend v1.3 backend v1.3 backend v1.3 backend v1.3 backend Replication Controller replicas: 4 v1.2 Replication Controller replicas: 1 v1.3 replicas: 3 replicas: 2replicas: 3replicas: 2replicas: 1 replicas: 4replicas: 0
  64. 64. ISA
  65. 65. ISA?
  66. 66. Open source: contribute!
  67. 67. Pets vs. Cattle
  68. 68. Questions? Images by Connie Zhou http://kubernetes.io

×