Bdc from bare metal to k8s

Chris Adkin
@ChrisAdkin8
cadkin@purestorage.com
Big Data Clusters
from
Bare Metal to
Kubernetes

Who Am I ?
▪ SQL Server Solution Architect at Pure Storage
▪ SQL Server user for 20 years
▪ Was heavily involved in the SQL Server 2019 EAP
▪ Co-author of the Microsoft workshop:
Big Data Clusters: From Bare Metal to Kubernetes

Why This Session ?
I’d like to deploy a Big Data Cluster,
are there any gotchas
I need to be aware of ?
Most orgs are familiar with Windows
and VMware as platforms, Kubernetes
and Linux, not so much

What We Will End Up With
Cluster Build Host
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
kubespray, ansible, git, kubectl and azdata
Kubernetes cluster
SQL Server 2019
Big Data Cluster running
on the three worker
nodes
3 node etcd cluster

Cluster Node Host Sizing
Development / Testing Production
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
4 GB memory
2 vCPU each
64 GB memory
8 vCPU each
16 GB memory
4 vCPU each
64 GB memory
8 vCPU each

Cluster Upgrades - The Immutable Infrastructure Way
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
Kubernetes 1.15
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
Kubernetes 1.19

Part 1: Building out the base infrastructure

Template Creation – ISO
▪ Get Ubuntu 16.04 AMD 64 Server Image
https://releases.ubuntu.com/16.04/
▪ Upload image to your VMware ISO data
store
▪ Create a virtual machine with a DVD drive
that boots from this ISO
▪ Next up creating an Ubuntu guest

Template Creation – Initial Network Configuration

sudo apt-get install -–install-recommends linux-generic-hwe-16.04 –y
DO THIS BEFORE YOU CREATE YOUR
KUBERNETES CLUSTER ON EACH NODE HOST,
OTHERWISE YOU WILL BREAK YOUR CLUSTER
Kernel Update Gotcha

Post Seed VM Creation Steps
▪ sudo apt-get update
▪ sudo apt-get install yamllint
▪ sudo reboot
▪ VMware vcenter -> virtual machine -> Template -> Convert to Template

Ubuntu VM
Template
Cluster Build Host
K8s Master 1
K8s Master 2
K8s Worker 1
K8s Worker 2
K8s Worker 3
Infrastructure Build Out From The Template
As we create each host, we need to do two things:
▪ Give each host a unique name
▪ Give each host a unique ip address
Tip: We could do this with Terraform and the VMware provider (very popular)

Hostname Configuration
sudo hostnamectl set-hostname <hostname>

iSCSI Gotcha
▪ If you are using an iSCSI based storage solution and cloned virtual machines . . .
▪ InitiatorName value in /etc/iscsi/initiatorname.iscsi needs to be unique for each node host

IP Address Configuration
1. Get name of your network adapter, it should be prefixed by ens
For iSCSI storage, you will need two adapters – here we just have the one

IP Address Configuration
2. Edit the netplan configuration file /etc/network/interfaces
auto <primary network interface>
iface <primary network interface> inet static
address <ip address>
netmask <netmask>
gateway <gateway ip address>
iface <secondary network interface> inet static
address <ip address>
netmask <netmask>
dns-nameservers <ip address>
Secondary NIC required,
if iSCSI storage is used

Part 2: Install and Configure Kubespray

▪ A tool based on Ansible playbooks and kubeadm
for managing a Kubernetes cluster’s life cycle:
▪ Cluster creation
▪ Cluster removal
▪ Upgrading a cluster
▪ Adding nodes
▪ Removing node
▪ Rebuilding master nodes
▪ Etc . . .
Kubespray – What Is It ?

Kubespray Installation
▪ sudo apt-get install git python3-pip
▪ git clone https://github.com/kubernetes-sigs/kubespray.git
▪ pip3 install –r kubespray/requirements.txt

Part 3: Kubernetes Cluster Creation

Kubespray – Our (Example) Cluster Topology
z-ca-bdc-master1
192.168.123.03
z-ca-bdc-master2
192.168.123.04
z-ca-bdc-worker1
192.168.123.05
z-ca-bdc-worker2
192.168.123.06
z-ca-bdc-worker3
192.168.123.07
master nodes
worker nodes
nodes for etcd instances

▪ cp –r kubespray/inventory/sample
kubespray/inventory/<cluster name>
▪ Edit inventory.ini file,
example on the right
▪ Inventory file path:
kubespray/inventory/<cluster name>/inventory.ini
Kubespray – Create An Ansible Inventory

Kubespray – Configure ssh Connectivity
The following commands are all to be run on the server hosting ansible
▪ ssh-keygen
▪ Carry the following out for each node host:
ssh-copy-id <username>@<hostname>
▪ ssh-agent /bin/bash
▪ ssh-add ~/.ssh/id_rsa
▪ Test ssh connectivity from the ansible server:
ansible -i inventory/<cluster name>/inventory.ini all -m ping

Storing ssh Passphrases With keychain
On the server you intend to run Kubespray from:
▪ sudo apt install keychain
▪ Add the following two lines to your .bashrc file, ~cadkin/.bashrc in my case:
/usr/bin/keychain $HOME/.ssh/id_rsa
source $HOME/.keychain/$HOSTNAME-sh

Kubespray – Run Playbook !!!
ansible-playbook -i inventory/<cluster name>/inventory.ini
--become --become-user=root -K cluster.yml

Cluster Creation Post Deployment Steps

Freeze The containerd.io Package !!!
sudo apt-mark hold containerd.io

▪ Install kubectl on Kubespray server:
snap install kubectl --classic
▪ Create directory on Kubespray server to hold context:
cd ~
mkdir .kube
▪ ssh onto any node in the cluster and then run:
sudo chmod 755 /etc/kubernetes/admin.conf
▪ On the Kubespray server - admin.conf only resides on master node hosts
sudo scp <username>@<hostname>:/etc/kubernetes/admin.conf ~/.kube/config
▪ ssh back onto the master node you got copied the admin.conf file from and issue:
sudo chmod 620 /etc/kubernetes/admin.conf
Post Deployment Steps

▪Check the health of the cluster nodes
kubectl get nodes –o wide
▪Create the health of the system pods
kubectl get po –n kube-system
Some Quick Post Cluster Creation Sanity Checks

▪ We need a storage plugin that
supports persistent volumes
▪ Never ever use ephemeral storage in
production
▪ Free options:
Portworx essentials
VMware Cloud Native Storage
A Word On Storage

Check That You Have A Storage Plugin Installed
kubectl get sc

Perform A Simple Test
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-pvc
spec:
storageClassName: <storage class>
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
And then . . .
kubectl apply –f test-pvc.yaml
kubectl get pvc
test-pvc.yml file contents:

Part 4: Big Data Cluster Creation

Sizing Your Cluster
Can you give me a reference
architecture for the infrastructure I
need for a Big Data Cluster ?
What you need really depends on
your workload, but . . .

Storage Gotchas
Persistent volume extension
As of CU6 persistent volumes (PVs) cannot be resized
through either azdata or Azure Data Studio
Pro tip: size PVs upfront to allow for data growth

Install azdata
https://docs.microsoft.com/en-us/sql/azdata/install/deploy-install-azdata-linux-package?view=sql-server-ver15

Working With Configuration Profiles
▪ Create a profile
azdata bdc config init --path ca-bdc-kubeadm-dev-test --source kubeadm-dev-test
▪ Specify the storage class for data
azdata bdc config replace --path ca-bdc-kubeadm-dev-test/control.json
--json-values "$.spec.storage.data.className=pure-block"
▪ Specify the size for data persistent volumes
--json-values "$.spec.storage.data.size=10Gi"
▪ Specify the storage class for logs
--json-values "$.spec.storage.logs.className=pure-block"
▪ Specify the size for log persistent volumes
--json-values "$.spec.storage.logs.size=5Gi"

Configuring The HDFS Replication Factor
azdata bdc config replace --path ca-bdc-kubeadm-dev-test/bdc.json
--json-values "$.spec.services.hdfs.settings={"hdfs-site.dfs.replication":"1"}"
▪ By default data is replicated three times
▪ If the storage platform has built-in
resilience, e.g. erasure coding we can . . .

‘Affinitize’ Worker Nodes To The Storage Pool
z-ca-bdc-worker1
( worker node 1 )
z-ca-bdc-worker2
( worker node 2 )
z-ca-bdc-worker3
( worker node 3 )
kubectl label node z-ca-bdc-worker1 mssql-cluster=bdc
mssql-resource=bdc-shared --overwrite=true
mssql-resource=bdc-shared --overwrite=true
mssql-resource=bdc-storage --overwrite=true
1. Label up the worker nodes

‘Affinitize’ Worker Nodes To The Storage Pool
2. Assign pools to worker nodes in the configuration profile
kubectl label node z-pa-bdc-worker1 mssql-cluster=bdc mssql-resource=bdc-shared --overwrite=true
kubectl label node z-pa-bdc-worker2 mssql-cluster=bdc mssql-resource=bdc-shared --overwrite=true
kubectl label node z-pa-bdc-worker3 mssql-cluster=bdc mssql-resource=bdc-storage --overwrite=true
azdata bdc config add -p ca-bdc-kubeadm-dev-test/control.json -j "$.spec.clusterLabel=bdc"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/control.json -j "$.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.master.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.compute-0.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.data-0.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.storage-0.spec.nodeLabel=bdc-storage"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.nmnode-0.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.sparkhead.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.zookeeper.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.gateway.spec.nodeLabel=bdc-shared"
azdata bdc config add -p ca-bdc-kubeadm-dev-test/bdc.json -j "$.spec.resources.appproxy.spec.nodeLabel=bdc-shared"

Deploy Your Cluster
azdata bdc create --config-profile <profile name> --accept-eula yes

We’ve Covered The Basics - Where To Next ?
▪ Load balancer installation and configuration - metallb is the easiest option
▪ Deploying the Kubernetes dashboard in a secure manner
▪ Backup and recovery
▪ Using production profiles which include HA and active directory integration
▪ Kubernetes cluster upgrades
▪ Monitoring a Kubernetes cluster via its built-in Prometheus exporter

Bill Of Materials
Component Version
VMware vSphere 6.7
Linux distribution Ubuntu server edition 16.04.7 LTS
Linux kernel 4.15.0-118-generic
Kubernetes 1.19.1
SQL Server 2019 Big Data Cluster CU6
Kubernetes storage plugin Pure Service Orchestrator 6.0.2

Any Questions . . .
twitter: @ChrisAdkin8
email : cadkin@purestorage.com

Bdc from bare metal to k8s

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bdc from bare metal to k8s

Similar to Bdc from bare metal to k8s (20)

More from Chris Adkin

More from Chris Adkin (16)

Recently uploaded

Recently uploaded (20)

Bdc from bare metal to k8s