SQLDay 2018
GOLD SPONSORS
SILVER SPONSORS
BRONZE SPONSOR
STRATEGIC PARTNER
SQLDay 2018
Kubernetes for data scientist
Łukasz Kałużny
MVP: Microsoft Azure
Cloud Technology Leader @ BlueSoft
kaluzny.io | @kaluzaaa
SQLDay 2018
https://www.explainxkcd.com/1988
SQLDay 2018
Containers
Type 1
Hardware
Hypervisor 1
VM VM VM
Virtual machine
Guest OS
Dependencies
Application
Hardware
Host OS
Docker Engine
Dependency 1 Dependency 2
C C C C C
Container
App dependencies
Application XYZ
Virtualization
SQLDay 2018
The container advantage
For developers
Fast iteration
Agile delivery
Immutability
For IT
Cost savings
Efficient deployment
Elastic bursting
SQLDay 2018
Dockerfile
FROM node:9.4.0-alpine
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD [ "npm", "run", "container" ]
SQLDay 2018
GPU and Docker
https://devblogs.nvidia.com/nvidia-docker-gpu-server-application-deployment-made-easy/
SQLDay 2018
SQLDay 2018
The elements of orchestration
Scheduling Affinity/anti-
affinity
Health
monitoring
Failover
Scaling Networking Service
discovery
Coordinated
app upgrades
SQLDay 2018
Kubernetes: the de-facto orchestrator
Portable
Public, private, hybrid,
multi-cloud
Extensible
Modular, pluggable,
hookable, composable
Self-healing
Auto-placement, auto-restart,
auto-replication, auto-scaling
SQLDay 2018
Kubernetes Architecture Components
api-server etcd
controller-manager scheduler
kubelet kube-proxy
docker dns
master
components
node
components
SQLDay 2018
Pod
SQLDay 2018
rating-api
port 3000
rating-web
port 8080
rating-db
port 27017
SQLDay 2018
app=heroes-web , tier=web
port
label
image azurecr.io/rating-web:v1
heroes-web
heroes-web-1
port 8080
heroes-web-2
port 8080
SQLDay 2018
port
app=heroes-web, tier=weblabel
image azurecr.io/rating-web:v1
heroes-web
heroes-web-1
port 8080
heroes-web-2
port 8080
Heroes-web
selector app=heroes-web
port
IP
8080:8080
10.0.2.20
SQLDay 2018
micro-web
port 8000
rating-web
port 8080
rating-web
port 8080
rating-web
port 8080
rating-web
port 8080
rating-web
port 8080
api-server
SQLDay 2018
k8s objects - computing
• Cron Jobs
• Daemon Sets
• Deployments
• Jobs
• Pods
• Stateful Sets
SQLDay 2018
k8s objects - network
• Ingresses
• Services
SQLDay 2018
k8s objects - config
• Config Map
• Secrets
SQLDay 2018
YAML - Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80
https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#exposing-pods-to-the-cluster
SQLDay 2018
YAML - Services
apiVersion: v1
kind: Service
metadata:
name: my-nginx
labels:
run: my-nginx
spec:
ports:
- port: 80
protocol: TCP
selector:
run: my-nginx
https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#creating-a-service
SQLDay 2018
DEMO
nginx
SQLDay 2018
How to deploy k8s in Azure
Azure Container Servcies
acs-engine
Azure Kubernetes Services
SQLDay 2018
AKS i GPU – How to start?
az group create --name myGPUCluster --location eastus
az aks create --resource-group myGPUCluster --name myGPUCluster
--node-vm-size Standard_NC6
az aks get-credentials --resource-group myGPUCluster --name
myGPUCluster
https://docs.microsoft.com/en-us/azure/aks/gpu-cluster
SQLDay 2018
Sample job
apiVersion: batch/v1
kind: Job
metadata:
labels:
app: samples-tf-mnist-demo
name: samples-tf-mnist-demo
spec:
template:
metadata:
labels:
app: samples-tf-mnist-demo
spec:
containers:
- name: samples-tf-mnist-demo
image: microsoft/samples-tf-mnist-demo:gpu
args: ["--max_steps", "500"]
imagePullPolicy: IfNotPresent
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
volumeMounts:
- name: nvidia
mountPath: /usr/local/nvidia
restartPolicy: OnFailure
volumes:
- name: nvidia
hostPath:
path: /usr/local/nvidia https://docs.microsoft.com/en-us/azure/aks/gpu-cluster
SQLDay 2018
Autoscaling in Kubernetes
https://medium.com/@wbuchwalter/autoscaling-a-kubernetes-cluster-created-with-acs-engine-on-azure-5e24ddc6402e
SQLDay 2018
Autoscaling in Kubernetes
https://medium.com/@wbuchwalter/autoscaling-a-kubernetes-cluster-created-with-acs-engine-on-azure-5e24ddc6402e
SQLDay 2018
Autoscaling in Kubernetes
https://medium.com/@wbuchwalter/autoscaling-a-kubernetes-cluster-created-with-acs-engine-on-azure-5e24ddc6402e
SQLDay 2018
kubeflow
„The Kubeflow project is dedicated to making deployments of machine
learning (ML) workflows on Kubernetes simple, portable and scalable. Our
goal is not to recreate other services, but to provide a straightforward way
to deploy best-of-breed open-source systems for ML to diverse
infrastructures. Anywhere you are running Kubernetes, you should be able
to run Kubeflow.”
SQLDay 2018
kubeflow
JupyterHub to create and manage interactive Jupyter notebooks. Project
Jupyter is a non-profit, open-source project to support interactive data
science and scientific computing across all programming languages.
TensorFlow Training Controller that can be configured to use either CPUs
or GPUs and dynamically adjusted to the size of a cluster with a single
setting
TensorFlow Serving container to export trained TensorFlow models to
Kubernetes
SQLDay 2018
JupyterHub
• A single hub & proxy for managing interactive sessions
• Can run entirely within Kubernetes - notebooks are backed by
Kubernetes pods
• Can request required resources - CPUs, GPUs, etc
• Has pluggable authentication (oauth, kdc, etc)
Made possible by: https://github.com/jupyterhub/kubespawner
SQLDay 2018
Tensorflow Training Controller
• A Kubernetes “operator” to help run distributed/non-distributed TF
training.
• Exposes an API through a CustomResourceDefinition
• Controller manages complexity of distributed training using Tensorflow.
Made possible by: https://github.com/tensorflow/k8s
SQLDay 2018
Tensorflow Serving
• A Kubernetes Deployment that can serve saved models
• Deployment - replicas can be scaled.
SQLDay 2018
DEMO
JupyterHub
SQLDay 2018
DEMO
Autoscaling in Kubernetes
SQLDay 2018
THANKS!
SQLDay 2018
GOLD SPONSORS
SILVER SPONSORS
BRONZE SPONSOR
STRATEGIC PARTNER

Kubernetes for data scientist

  • 1.
    SQLDay 2018 GOLD SPONSORS SILVERSPONSORS BRONZE SPONSOR STRATEGIC PARTNER
  • 2.
    SQLDay 2018 Kubernetes fordata scientist Łukasz Kałużny MVP: Microsoft Azure Cloud Technology Leader @ BlueSoft kaluzny.io | @kaluzaaa
  • 3.
  • 4.
    SQLDay 2018 Containers Type 1 Hardware Hypervisor1 VM VM VM Virtual machine Guest OS Dependencies Application Hardware Host OS Docker Engine Dependency 1 Dependency 2 C C C C C Container App dependencies Application XYZ Virtualization
  • 5.
    SQLDay 2018 The containeradvantage For developers Fast iteration Agile delivery Immutability For IT Cost savings Efficient deployment Elastic bursting
  • 6.
    SQLDay 2018 Dockerfile FROM node:9.4.0-alpine WORKDIR/usr/src/app COPY package*.json ./ RUN npm install COPY . . EXPOSE 3000 CMD [ "npm", "run", "container" ]
  • 7.
    SQLDay 2018 GPU andDocker https://devblogs.nvidia.com/nvidia-docker-gpu-server-application-deployment-made-easy/
  • 8.
  • 9.
    SQLDay 2018 The elementsof orchestration Scheduling Affinity/anti- affinity Health monitoring Failover Scaling Networking Service discovery Coordinated app upgrades
  • 10.
    SQLDay 2018 Kubernetes: thede-facto orchestrator Portable Public, private, hybrid, multi-cloud Extensible Modular, pluggable, hookable, composable Self-healing Auto-placement, auto-restart, auto-replication, auto-scaling
  • 11.
    SQLDay 2018 Kubernetes ArchitectureComponents api-server etcd controller-manager scheduler kubelet kube-proxy docker dns master components node components
  • 12.
  • 13.
  • 14.
    SQLDay 2018 app=heroes-web ,tier=web port label image azurecr.io/rating-web:v1 heroes-web heroes-web-1 port 8080 heroes-web-2 port 8080
  • 15.
    SQLDay 2018 port app=heroes-web, tier=weblabel imageazurecr.io/rating-web:v1 heroes-web heroes-web-1 port 8080 heroes-web-2 port 8080 Heroes-web selector app=heroes-web port IP 8080:8080 10.0.2.20
  • 16.
    SQLDay 2018 micro-web port 8000 rating-web port8080 rating-web port 8080 rating-web port 8080 rating-web port 8080 rating-web port 8080 api-server
  • 17.
    SQLDay 2018 k8s objects- computing • Cron Jobs • Daemon Sets • Deployments • Jobs • Pods • Stateful Sets
  • 18.
    SQLDay 2018 k8s objects- network • Ingresses • Services
  • 19.
    SQLDay 2018 k8s objects- config • Config Map • Secrets
  • 20.
    SQLDay 2018 YAML -Deployment apiVersion: apps/v1 kind: Deployment metadata: name: my-nginx spec: selector: matchLabels: run: my-nginx replicas: 2 template: metadata: labels: run: my-nginx spec: containers: - name: my-nginx image: nginx ports: - containerPort: 80 https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#exposing-pods-to-the-cluster
  • 21.
    SQLDay 2018 YAML -Services apiVersion: v1 kind: Service metadata: name: my-nginx labels: run: my-nginx spec: ports: - port: 80 protocol: TCP selector: run: my-nginx https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#creating-a-service
  • 22.
  • 23.
    SQLDay 2018 How todeploy k8s in Azure Azure Container Servcies acs-engine Azure Kubernetes Services
  • 24.
    SQLDay 2018 AKS iGPU – How to start? az group create --name myGPUCluster --location eastus az aks create --resource-group myGPUCluster --name myGPUCluster --node-vm-size Standard_NC6 az aks get-credentials --resource-group myGPUCluster --name myGPUCluster https://docs.microsoft.com/en-us/azure/aks/gpu-cluster
  • 25.
    SQLDay 2018 Sample job apiVersion:batch/v1 kind: Job metadata: labels: app: samples-tf-mnist-demo name: samples-tf-mnist-demo spec: template: metadata: labels: app: samples-tf-mnist-demo spec: containers: - name: samples-tf-mnist-demo image: microsoft/samples-tf-mnist-demo:gpu args: ["--max_steps", "500"] imagePullPolicy: IfNotPresent resources: limits: alpha.kubernetes.io/nvidia-gpu: 1 volumeMounts: - name: nvidia mountPath: /usr/local/nvidia restartPolicy: OnFailure volumes: - name: nvidia hostPath: path: /usr/local/nvidia https://docs.microsoft.com/en-us/azure/aks/gpu-cluster
  • 26.
    SQLDay 2018 Autoscaling inKubernetes https://medium.com/@wbuchwalter/autoscaling-a-kubernetes-cluster-created-with-acs-engine-on-azure-5e24ddc6402e
  • 27.
    SQLDay 2018 Autoscaling inKubernetes https://medium.com/@wbuchwalter/autoscaling-a-kubernetes-cluster-created-with-acs-engine-on-azure-5e24ddc6402e
  • 28.
    SQLDay 2018 Autoscaling inKubernetes https://medium.com/@wbuchwalter/autoscaling-a-kubernetes-cluster-created-with-acs-engine-on-azure-5e24ddc6402e
  • 29.
    SQLDay 2018 kubeflow „The Kubeflowproject is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.”
  • 30.
    SQLDay 2018 kubeflow JupyterHub tocreate and manage interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages. TensorFlow Training Controller that can be configured to use either CPUs or GPUs and dynamically adjusted to the size of a cluster with a single setting TensorFlow Serving container to export trained TensorFlow models to Kubernetes
  • 31.
    SQLDay 2018 JupyterHub • Asingle hub & proxy for managing interactive sessions • Can run entirely within Kubernetes - notebooks are backed by Kubernetes pods • Can request required resources - CPUs, GPUs, etc • Has pluggable authentication (oauth, kdc, etc) Made possible by: https://github.com/jupyterhub/kubespawner
  • 32.
    SQLDay 2018 Tensorflow TrainingController • A Kubernetes “operator” to help run distributed/non-distributed TF training. • Exposes an API through a CustomResourceDefinition • Controller manages complexity of distributed training using Tensorflow. Made possible by: https://github.com/tensorflow/k8s
  • 33.
    SQLDay 2018 Tensorflow Serving •A Kubernetes Deployment that can serve saved models • Deployment - replicas can be scaled.
  • 34.
  • 35.
  • 36.
  • 37.
    SQLDay 2018 GOLD SPONSORS SILVERSPONSORS BRONZE SPONSOR STRATEGIC PARTNER