1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Cloudbreak
on Kubernetes
Richard Doktorics
Krisztian Horvath
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Who we are?
 Krisztian Horvath
– Staff Engineer at Hortonworks
– Works on Cloudbreak from the beginning
– @keyki
 Richard Doktorics
– Senior Software Engineer
– Works on Cloudbreak from the beginning
– @doktoric
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
 Cloudbreak
 Kubernetes
 Helm
 Cloudbreak Rolling Update
 Log collection
 Monitoring & Alerting
 Questions
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Cloudbreak is a tool for provisioning Hadoop clusters on cloud infrastructure
 Simplified Cluster Provisioning
 Automated Cluster Scaling
– AMS (Ambari Metrics System)
– Prometheus based metrics
 Highly Extensible
– Recipes for scripting extensions that run before/after cluster provisioning
– Custom cloud images
 Multiple platforms are supported
– AWS
– GCP
– Azure
– OpenStack
– BYOS (Bring Your Own Stack)
What is Cloudbreak?
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Cloudbreak Deployer (CBD)
– Written in Go and Bash (go-basher)
– Compiled into single binary
 Micro-service architecture
– Each service runs in a Docker
container
– Each container is replaceable
with custom ones
– Services are handled with
docker-compose
Single node deployment
IMAGE NAMES
traefik:v1.3.8-alpine cbreak_traefik_1
hortonworks/cloudbreak:2.1.0 cbreak_cloudbreak_1
postgres:9.6.1-alpine cbreak_commondb_1
hortonworks/cloudbreak-uaa cbreak_identity_1
hortonworks/hdc-auth:2.1.0 cbreak_sultans_1
hortonworks/cloudbreak-autoscale:2.1.0 cbreak_periscope_1
hortonworks/hdc-web:2.1.0 cbreak_uluwatu_1
gliderlabs/consul-server:0.5 cbreak_consul_1
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Run Cloudbreak in HA (High Available) mode
– Ability to recover flows in case of node failure
– Avoid master-slave design / leader election problems
 Scale Cloudbreak as we desire
– Distribute each cluster related flow
– Cannot run 2 flows for the same cluster at the same time (e.g: 2 upscale flows)
– Flow cancellation must be handled
 Scale the Web UI
– Had to introduce a Redis cluster for the session store
 Scale every other service as well
 Find a tool that makes it easy to deploy these services to multiple nodes
 Cloudbreak as a Service that is accessible by everyone and can start clusters anywhere
Our goal was to..
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Kubernetes
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Kubernetes is an open-source platform designed to automate deploying, scaling and
operating application containers
 Deploy your applications quickly and predictably
 Scale your applications on the fly
 Roll out new features seamlessly
 Limit hardware usage to required resources only
 Portable: public, private, hybrid, multi-cloud
 Extensible: modular, pluggable, hookable, composable
 Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling
What is Kubernetes?
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Not because it’s fancy..
 Evaluated Kubernetes, Swarm, Mesos, Rancher
 Open source / Active community with hands-on experience
 Many cloud providers already supports it
 Lots of tooling behind it / API / CLI / Helm / Ansible / Salt
 Integration with most of the cloud providers
– Provision Load Balancer (GCP, AWS, Azure)
– Use object stores to share data (Ceph, S3, GCP bucket, Azure Storage Account)
– Dynamic volume provisioning / Persistent disk (EBS, Azure Blob)
Why Kubernetes?
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Kubernetes on Azure
 az aks create --resource-group k8srg --name k8s --agent-count 5 --agent-osdisk-size 100 --agent-vm-size Standard_D12_v2
--service-principal sp --client-secret cs --dns-name-prefix k8s --location westus --ssh-key-value ~/.ssh/id_rsa.pub
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 ACS (Azure Container Service)
– Can run Kubernetes, Swarm, DC/OS
 AKS (Managed Kubernetes)
– No master VMs (at least on your side)
– Multiple agent pools with different VM types
– Scale the agent pools independently
– Automatic upgrades
 ACI (Azure Container Instances)
– No VMs to provision
– “Endless” resource pool
– Pay by seconds
– Can act “as a node” in the Kubernetes cluster
ACS / AKS / ACI
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Pod
– Group of one or more containers with shared storage/network
– Always co-located and co-scheduled and run in a shared context
 Deployment
– Provides declarative updates for Pods
 StatefulSet
– Manages the deployment and scaling of a set of Pods
and provides guarantees about the ordering
and uniqueness of these Pods
– Has a persistent identifier that it maintains across
any rescheduling
 Service
– Abstraction which defines a logical set of Pods and a policy by which to access them
 Declared in yml files
Kubernetes resources
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service (cloudbreak.default.svc.cluster.local)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image: hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Helm
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 No real competitor
 Helps you manage Kubernetes applications
 Officially approved by community
 Official Charts
 Rolling upgrade
 Helm is the client, Tiller is the server
 Tiller is a Kubernetes pod
Why Helm?
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Helm on Kubernetes
 Helm package ~= Chart
– Define
– Install
– Upgrade
 Chart
– values.yml: stores variables for the template files templates directory
– Chart.yml: describes the chart, as in it’s name, description and version
– kubernetes templates.yml: Go template support
 Separated Charts for every component
– Cloudbreak
– Monitoring
– Analytics
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service Deployment template Helm Service template Helm
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image:
hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ .Release.Name }}-cloudbreak
spec:
replicas: {{ .Values.replicas }}
selector:
matchLabels:
app: cloudbreak
release: {{ .Release.Name }}
template:
metadata:
labels:
app: cloudbreak
release: {{ .Release.Name }}
spec:
containers:
- name: cloudbreak
image: {{ .Values.cbImage }}
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}-cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
release: {{ .Release.Name }}
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Rolling Update
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Rolling Update
 The goal is to have zero downtime update
 Ability to roll back in case something goes wrong
 Rolling Update strategy with Readiness Probe
 Canary releasing
 Prepare for running 2 versions of the application at the same time
Strategy Readiness Probe
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
readinessProbe:
httpGet:
path: /cb/info
port: 8080
initialDelaySeconds: 90
failureThreshold: 5
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Canary releasing
 Run a new version of the application along with the stable one and route
some of the users to this version
 Run your tests against the new version and once you are happy with the results shut
down the old version
 Maintain backward compatibility or you’ll break the update
 Hard to change the database
schema
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Canary releasing
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging and Monitoring
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging
 Logspout
– Collecting the logs from Docker socket
 Logstash
– Redirecting logs to file outputs
 Azure File Share
– Storing the Log files in Samba share
 LogSearch
– Owned by Hortonworks
– Using Solr under the hood
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitoring
 Prometheus
– Java metrics (Custom metrics)
– Provider per cluster
– REST status codes
– Response times
– Active flows per node
– Go metrics
– Consul metrics
– Linux/ Host metrics
– NodeJS metrics
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Alerting
ALERT successful_stack_creation_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.successful.aws"}[5m])) > 0
ANNOTATIONS {
status="INFO”,
description="A new stack has been created on AWS.”
}
ALERT stack_creation_failed_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.failed.aws"}[5m])) > 0
ANNOTATIONS {
status="WARN”,
description="Failed to create a stack on AWS.”
}
ALERT node_down
IF up{job='node_exporter'} == 0
FOR 5m
ANNOTATIONS {
status="ERROR”,
description = "Node {{ $labels.instance }} is down for more than 15 minutes”,
}
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions?
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank you!
Instagram (@hortonworks.hungary)

Running Cloudbreak on Kubernetes

  • 1.
    1 © HortonworksInc. 2011 – 2017. All Rights Reserved Running Cloudbreak on Kubernetes Richard Doktorics Krisztian Horvath
  • 2.
    2 © HortonworksInc. 2011 – 2017. All Rights Reserved Who we are?  Krisztian Horvath – Staff Engineer at Hortonworks – Works on Cloudbreak from the beginning – @keyki  Richard Doktorics – Senior Software Engineer – Works on Cloudbreak from the beginning – @doktoric
  • 3.
    3 © HortonworksInc. 2011 – 2017. All Rights Reserved Agenda  Cloudbreak  Kubernetes  Helm  Cloudbreak Rolling Update  Log collection  Monitoring & Alerting  Questions
  • 4.
    4 © HortonworksInc. 2011 – 2017. All Rights Reserved Cloudbreak is a tool for provisioning Hadoop clusters on cloud infrastructure  Simplified Cluster Provisioning  Automated Cluster Scaling – AMS (Ambari Metrics System) – Prometheus based metrics  Highly Extensible – Recipes for scripting extensions that run before/after cluster provisioning – Custom cloud images  Multiple platforms are supported – AWS – GCP – Azure – OpenStack – BYOS (Bring Your Own Stack) What is Cloudbreak?
  • 5.
    5 © HortonworksInc. 2011 – 2017. All Rights Reserved
  • 6.
    6 © HortonworksInc. 2011 – 2017. All Rights Reserved  Cloudbreak Deployer (CBD) – Written in Go and Bash (go-basher) – Compiled into single binary  Micro-service architecture – Each service runs in a Docker container – Each container is replaceable with custom ones – Services are handled with docker-compose Single node deployment IMAGE NAMES traefik:v1.3.8-alpine cbreak_traefik_1 hortonworks/cloudbreak:2.1.0 cbreak_cloudbreak_1 postgres:9.6.1-alpine cbreak_commondb_1 hortonworks/cloudbreak-uaa cbreak_identity_1 hortonworks/hdc-auth:2.1.0 cbreak_sultans_1 hortonworks/cloudbreak-autoscale:2.1.0 cbreak_periscope_1 hortonworks/hdc-web:2.1.0 cbreak_uluwatu_1 gliderlabs/consul-server:0.5 cbreak_consul_1
  • 7.
    7 © HortonworksInc. 2011 – 2017. All Rights Reserved  Run Cloudbreak in HA (High Available) mode – Ability to recover flows in case of node failure – Avoid master-slave design / leader election problems  Scale Cloudbreak as we desire – Distribute each cluster related flow – Cannot run 2 flows for the same cluster at the same time (e.g: 2 upscale flows) – Flow cancellation must be handled  Scale the Web UI – Had to introduce a Redis cluster for the session store  Scale every other service as well  Find a tool that makes it easy to deploy these services to multiple nodes  Cloudbreak as a Service that is accessible by everyone and can start clusters anywhere Our goal was to..
  • 8.
    8 © HortonworksInc. 2011 – 2017. All Rights Reserved Kubernetes
  • 9.
    9 © HortonworksInc. 2011 – 2017. All Rights Reserved Kubernetes is an open-source platform designed to automate deploying, scaling and operating application containers  Deploy your applications quickly and predictably  Scale your applications on the fly  Roll out new features seamlessly  Limit hardware usage to required resources only  Portable: public, private, hybrid, multi-cloud  Extensible: modular, pluggable, hookable, composable  Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling What is Kubernetes?
  • 10.
    10 © HortonworksInc. 2011 – 2017. All Rights Reserved  Not because it’s fancy..  Evaluated Kubernetes, Swarm, Mesos, Rancher  Open source / Active community with hands-on experience  Many cloud providers already supports it  Lots of tooling behind it / API / CLI / Helm / Ansible / Salt  Integration with most of the cloud providers – Provision Load Balancer (GCP, AWS, Azure) – Use object stores to share data (Ceph, S3, GCP bucket, Azure Storage Account) – Dynamic volume provisioning / Persistent disk (EBS, Azure Blob) Why Kubernetes?
  • 11.
    11 © HortonworksInc. 2011 – 2017. All Rights Reserved Running Kubernetes on Azure  az aks create --resource-group k8srg --name k8s --agent-count 5 --agent-osdisk-size 100 --agent-vm-size Standard_D12_v2 --service-principal sp --client-secret cs --dns-name-prefix k8s --location westus --ssh-key-value ~/.ssh/id_rsa.pub
  • 12.
    12 © HortonworksInc. 2011 – 2017. All Rights Reserved  ACS (Azure Container Service) – Can run Kubernetes, Swarm, DC/OS  AKS (Managed Kubernetes) – No master VMs (at least on your side) – Multiple agent pools with different VM types – Scale the agent pools independently – Automatic upgrades  ACI (Azure Container Instances) – No VMs to provision – “Endless” resource pool – Pay by seconds – Can act “as a node” in the Kubernetes cluster ACS / AKS / ACI
  • 13.
    13 © HortonworksInc. 2011 – 2017. All Rights Reserved  Pod – Group of one or more containers with shared storage/network – Always co-located and co-scheduled and run in a shared context  Deployment – Provides declarative updates for Pods  StatefulSet – Manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods – Has a persistent identifier that it maintains across any rescheduling  Service – Abstraction which defines a logical set of Pods and a policy by which to access them  Declared in yml files Kubernetes resources
  • 14.
    14 © HortonworksInc. 2011 – 2017. All Rights Reserved Deployment and Service example Deployment Service (cloudbreak.default.svc.cluster.local) apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cloudbreak spec: replicas: 5 selector: matchLabels: app: cloudbreak template: metadata: labels: app: cloudbreak spec: containers: - name: cloudbreak image: hortonworks/cloudbreak:2.1.0 ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105
  • 15.
    15 © HortonworksInc. 2011 – 2017. All Rights Reserved Helm
  • 16.
    16 © HortonworksInc. 2011 – 2017. All Rights Reserved  No real competitor  Helps you manage Kubernetes applications  Officially approved by community  Official Charts  Rolling upgrade  Helm is the client, Tiller is the server  Tiller is a Kubernetes pod Why Helm?
  • 17.
    17 © HortonworksInc. 2011 – 2017. All Rights Reserved Running Helm on Kubernetes  Helm package ~= Chart – Define – Install – Upgrade  Chart – values.yml: stores variables for the template files templates directory – Chart.yml: describes the chart, as in it’s name, description and version – kubernetes templates.yml: Go template support  Separated Charts for every component – Cloudbreak – Monitoring – Analytics
  • 18.
    18 © HortonworksInc. 2011 – 2017. All Rights Reserved Deployment and Service example Deployment Service Deployment template Helm Service template Helm apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cloudbreak spec: replicas: 5 selector: matchLabels: app: cloudbreak template: metadata: labels: app: cloudbreak spec: containers: - name: cloudbreak image: hortonworks/cloudbreak:2.1.0 ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105 apiVersion: extensions/v1beta1 kind: Deployment metadata: name: {{ .Release.Name }}-cloudbreak spec: replicas: {{ .Values.replicas }} selector: matchLabels: app: cloudbreak release: {{ .Release.Name }} template: metadata: labels: app: cloudbreak release: {{ .Release.Name }} spec: containers: - name: cloudbreak image: {{ .Values.cbImage }} ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: {{ .Release.Name }}-cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak release: {{ .Release.Name }} ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105
  • 19.
    19 © HortonworksInc. 2011 – 2017. All Rights Reserved Rolling Update
  • 20.
    20 © HortonworksInc. 2011 – 2017. All Rights Reserved Rolling Update  The goal is to have zero downtime update  Ability to roll back in case something goes wrong  Rolling Update strategy with Readiness Probe  Canary releasing  Prepare for running 2 versions of the application at the same time Strategy Readiness Probe strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 readinessProbe: httpGet: path: /cb/info port: 8080 initialDelaySeconds: 90 failureThreshold: 5
  • 21.
    21 © HortonworksInc. 2011 – 2017. All Rights Reserved Canary releasing  Run a new version of the application along with the stable one and route some of the users to this version  Run your tests against the new version and once you are happy with the results shut down the old version  Maintain backward compatibility or you’ll break the update  Hard to change the database schema
  • 22.
    22 © HortonworksInc. 2011 – 2017. All Rights Reserved Canary releasing
  • 23.
    23 © HortonworksInc. 2011 – 2017. All Rights Reserved Logging and Monitoring
  • 24.
    24 © HortonworksInc. 2011 – 2017. All Rights Reserved Logging  Logspout – Collecting the logs from Docker socket  Logstash – Redirecting logs to file outputs  Azure File Share – Storing the Log files in Samba share  LogSearch – Owned by Hortonworks – Using Solr under the hood
  • 25.
    25 © HortonworksInc. 2011 – 2017. All Rights Reserved
  • 26.
    26 © HortonworksInc. 2011 – 2017. All Rights Reserved Monitoring  Prometheus – Java metrics (Custom metrics) – Provider per cluster – REST status codes – Response times – Active flows per node – Go metrics – Consul metrics – Linux/ Host metrics – NodeJS metrics
  • 27.
    27 © HortonworksInc. 2011 – 2017. All Rights Reserved Alerting ALERT successful_stack_creation_aws IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.successful.aws"}[5m])) > 0 ANNOTATIONS { status="INFO”, description="A new stack has been created on AWS.” } ALERT stack_creation_failed_aws IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.failed.aws"}[5m])) > 0 ANNOTATIONS { status="WARN”, description="Failed to create a stack on AWS.” } ALERT node_down IF up{job='node_exporter'} == 0 FOR 5m ANNOTATIONS { status="ERROR”, description = "Node {{ $labels.instance }} is down for more than 15 minutes”, }
  • 28.
    28 © HortonworksInc. 2011 – 2017. All Rights Reserved Questions?
  • 29.
    29 © HortonworksInc. 2011 – 2017. All Rights Reserved Thank you! Instagram (@hortonworks.hungary)