Clustering tensor flow con kubernetes y raspberry pi

Clustering Tensor
Flow con
Kubernetes y
Raspberry Pi
Andres L Martinez
@davilagrau
Photo by William Felker on Unsplash

almo
Google Developer Program Lead PAN EU
@davilagrau
https://www.linkedin.com/in/aleonar
https://www.instagram.com/davilagrau
https://github.com/almo
https://www.facebook.com/davilagrau

Lucas Käldström’s Motivation
● Kubernetes’ Dev Community
○ committer
○ maintainer
● Main motivations
○ Learning Google’s
technologies
○ Developing Open Source
○ Re-use old/cheap HW

Photo by Ken Treloar on Unsplash
● Hyperparameter tuning
○ Auto ML
● Scaling on QPS
○ ML API
● Ensemble learning
● Data Parallelism
Model Parallelism?
Don’t ask, don’t tell

Raspberry Pi 3
● Single-board computer
● ARM 1.2 GHz 64/32-bit quad-core
○ VFPv4 Floating Point Unit
onboard (per core)
○ Hardware virtualization support
● 1 GB LPDDR2 RAM at 900 MHz
● MicroSDHC slot
● Bluetooth 4.1
● 2.4 GHz WiFi 802.11n & Ethernet
10/100

Kubernetes
● Kubernetes is an open-source system
for automating deployment, scaling,
and management of containerized
applications.
● Horizontal scaling
● Service discovery and load balancing
● Self-healing

TensorFlow
● TensorFlow is an open source
software library for numerical
computation using data flow
graphs.
● TensorFlow has APIs available in
C++, Python, Java and Go.
● TensorFlow has also bindings
for: C#, Haskell, Julia, Ruby, Rust,
and Scala.
● TensorFlow Lite is TensorFlow’s
lightweight solution for mobile
and embedded devices
Raspberry Pi version
is coming soon!

Architecture
HypriotOS HypriotOS HypriotOS HypriotOS
Kubernetes
Cluster
Tensor Flow
Cluster

Setting Kubernetes up!
Master
Kubeadm
Node #2
Node#1
Node#3
Kubelet
Docker
KubeletDocker
Kubelet
Docker

Setting up the master
Settings
● OS: Installing Docker on Raspbian OR
○ Download and flash HypriotOS v1.7.1 from
https://goo.gl/y9Jyzd
● Setting up Kubernetes repositories
○ Key / Source
Master
Kubeadm
Commands
● apt-get update && apt-get install -y kubeadm
● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset >
/boot/cmdline.txt
● swapoff -a #Note: Kubernetes 1.8
● kubeadm init --pod-network-cidr 10.244.0.0/16

Setting up the node (each)
Settings
● OS: Installing Docker on Raspbian OR
○ Download and flash HypriotOS v1.7.1 from
https://goo.gl/y9Jyzd
● Setting up Kubernetes repositories
○ Key / Source
Commands
● apt-get update && apt-get install -y kubeadm
● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset
> /boot/cmdline.txt
● swapoff -a #Note: Kubernetes 1.8
● kubeadm join --token=XXXXX Master-IP
Node #2
Kubelet
Docker

Setting the network
Flannel: flannel is a virtual network that
gives a subnet to each host for use with
container runtimes
“Platforms like Google's Kubernetes assume that
each container (pod) has a unique, routable IP inside
the cluster. The advantage of this model is that it
reduces the complexity of doing port mapping”

Let’s scale with
Ansible Scripts https://github.com/lahsivjar/kube-arm

Parallelization Strategies
Distributed TensorFlow
Explicit (device block): TensorFlow will insert
the appropriate data transfers between the jobs.
with tf.device(“/cpu:0”):
a = tf.Variable(3.0)
b = tf.Variable(3.0)
c = a * b
Parallelization strategies:
● In-graph replication
● Between-graph replication
● Asynchronous training
● Synchronous training
TensorFlow Serving
It might be also
a function

TensorFlow Cluster
A TensorFlow "cluster" is a set of "tasks" that participate
in the distributed execution of a TensorFlow graph.
Steps:
1. Create a tf.train.ClusterSpec that describes all of the
tasks in the cluster. This should be the same for each
task.
2. Create a tf.train.Server, passing the
tf.train.ClusterSpec to the constructor, and identifying
the local task with a job name and task index.

TensorFlow Cluster
Node #0
tf.train.ClusterSpec({
"worker": [
"worker0.example.com:2222",
"worker1.example.com:2222"
],
"ps": [
"ps0.example.com:2222",
"ps1.example.com:2222"
]})
Worker
Node #1
Worker
Node #1
PS
Node #0
PS

Setting up TensorFlow Cluster I
Node #1
PS
Node #0
PS $ python trainer.py
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222
--job_name=ps --task_index=0
$ python trainer.py
--job_name=ps --task_index=1
Example: Between-graph replication / Asynchronous training

Setting up TensorFlow Cluster II
Node #1
Worker
Node #0
Worker $ python trainer.py
--job_name=worker --task_index=0
$ python trainer.py
--job_name=worker --task_index=1

Sharding Variables in Multiples Parameters Servers
with tf.device(tf.train.replica_device_setter())
with
tf.train.MonitoredTrainingSession(master=server.target,
is_chief=(FLAGS.task_index == 0),
checkpoint_dir="/tmp/train_logs",
hooks=hooks) as mon_sess:
Node #1
PS Node #2
PS
Node #0
PS

Docker Image
CPU Only i.e. Raspberry Pi
● TensorFlow 1.1 https://goo.gl/URUpko
● Official TensorFlow Lite for Raspberry Pi,
Coming Soon! https://goo.gl/viqtuQ
● resin/rpi-raspbian + tensorflow 1.4
Coming soon!

PET!
Putting Everything
Together

Pod Controllers: Stateful Sets (PS & Workers)
● Manages the deployment and
scaling of a set of Pods.
● Provides guarantees about the
ordering and uniqueness of
these Pods.
● StatefulSet manages Pods that
are based on an identical
container spec.
● StatefulSet maintains a sticky
identity for each of their Pods.
StatefulSet
#0
Node #1
PS
Node #0
PS
Node #1
Worker
Node #0
Worker
StatefulSet
#1

Cluster Scalability
+1
Rolling
Update!
WorkersPS
WorkersPS
(Kubernetes)
TensorFlow

Thank you!
Questions?
Andres L Martinez
@davilagrau

Setting the cluster I
$ kubectl run hypriot --image=hypriot/rpi-busybox-httpd --replicas=3 --port=80
$ kubectl expose deployment hypriot --port 80
$ kubectl get endpoints hypriot
1
2
3

Setting the cluster II
$ kubectl apply -f traefik-ingress-controller.yaml
$ kubectl label node IP nginx-controller=traefik
$ kubectl apply -f cluster-ingress.yaml
1
2
3
$ cat cluster-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: hypriot
spec:
rules:
- http:
paths:
- path: /
backend:
serviceName: hypriot
servicePort: 80

Kubernetes: orquestación de imagenes de Rasberri Pi
para el despliegue de tensorflow server.
Description:
Structure:
- Use case: IoT / processing information
- Architecture Kubernetes + Tensorflow+ rasberri pi
- Python
- Introduction to Kubernetes (Ansible?) (Laura)
- Master / Slave architecture (Laura)
- Container description: labelling and pod matching
(Laura)
- Configuration (Laura)
- Load balancer (Laura)
- round robin HTTP request (Laura)
- Monitoring load of the replicas (Laura)
- Failure tolerance (Laura)
- Introduction to Tensor Flow
- Introduction to TensorFlow / MachineLearning
- Computation Graph
- Introduction to TensorFlow Server
- Development of Use Case

Dashboard
codemotion-1 192.168.1.76 B8:27:EB:E5:9D:A5
codemotion-3 192.168.1.75 B8:27:EB:68:5B:F9
codemotion-4 192.168.1.77 B8:27:EB:1F:EF:29
codemotion-2 192.168.1.78 B8:27:EB:19:38:C3

Laura Morillo-Velarde
● Backend engineer at seedtag
● Twitter: @Laura_Morillo
● WTM Lead at GDG Madrid

Clustering tensor flow con kubernetes y raspberry pi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Clustering tensor flow con kubernetes y raspberry pi

Similar to Clustering tensor flow con kubernetes y raspberry pi (20)

More from Andrés Leonardo Martinez Ortiz

More from Andrés Leonardo Martinez Ortiz (20)

Recently uploaded

Recently uploaded (20)

Clustering tensor flow con kubernetes y raspberry pi