All AI Roads lead to Distribution - Dot AI

All AI Roads Lead to Distribution
jim_dowling
dotAI Paris 2018

©2018 Logical Clocks AB. All Rights Reserved
2/38
“Methods that scale with computation
are the future of AI”*
- Rich Sutton (Founding Father of Reinforcement Learning)
* https://www.youtube.com/watch?v=EeMCEQa85tw

Massive Increase in Compute for AI*
Distributed Systems
3.5 month-doubling time
*https://blog.openai.com/ai-and-compute
3/38

Model Parallelism Data Parallelism
GPU2
GPU3
GPU1Layers1-100
Layer100-199
Layer200-299
Training/Test Data
Predictions
GPU0 GPU1 GPU99
Distributed Filesystem
32 files 32 files 32 files
Batch Size
of 3200
One Big Model on 3 GPUs
Copy of the Model
on each GPU
Aggregate
Gradients
New Model
Barrier
4/38

CNNs
Distribution
Facebook vs Google
ImageNet
Mini AI Arms Race
5/38

ImageNet
[Image from https://www.slideshare.net/xavigiro/image-classification-on-imagenet-d1l4-2017-upc-deep-learning-for-computer-vision ]
6/38

Improved Accuracy for ImageNet

Improvements in ImageNet – Accuracy
81.5
82
82.5
83
83.5
84
84.5
GOOGLE-1 (MAY '17) GOOGLE-2 (MARCH '18) FACEBOOK (MAY '18)
Top-1 Accuracy for ImageNet
ImageNet Top-1
AutoML
with RL
AutoML
with GA
Facebook- https://goo.gl/ERpJyr
Google-1: https://goo.gl/EV7Xv1
Google-2:https://goo.gl/eidnyQ
More Compute
1 Bn Images
More Data
8/38

Reduced
Generalization
Error
Hyper
Parameter
Optimization
Larger
Training
Datasets
Better
Regularization
Methods
Better
Optimization
Algorithms
Design
Better
Models
Methods for Improving Model Accuracy
Distribution
9/38

Faster Training of ImageNet
Single GPU
[Image from https://www.matroid.com/scaledml/2018/jeff.pdf]
Clustered GPUs

Improvements in ImageNet – Training Time
June’17 Oct’17
FB - https://goo.gl/ERpJyr
0
10
20
30
40
50
60
FACEBOOK (JUNE '17) GOOGLE-1 (DEC '17) GOOGLE-2 (MARCH '18)
Training Time (mins) ImageNet >75% top-1
Training Time (mins)
256 Nvidia P100s
256 TPUv2s
256 TPUv2s
11/38

0
20
40
60
80
100
120
140
160
FACEBOOK (JUNE '17) GOOGLE-1 (DEC '17) GOOGLE-2 (MARCH '18)
1000s Files/Sec Processed
Files Processed (K/s)
ImageNet – Files/Sec Processed
June’17
FB - https://goo.gl/ERpJyr
HDFS
~80K/s
12/38

0
200
400
600
800
1000
1200
FACEBOOK GOOGLE-1 GOOGLE-2 HOPSFS
Distributed Filesystem Read Throughput
Files Processed (K/s)
Remove Bottleneck and Keep Scaling
HopsFS - https://goo.gl/yFCsGc
Scale Challenge Winner (2017)
Hops
PLUG for Hops
13/38

All AI Roads Lead to Distribution
Distributed
Deep Learning
Hyper
Parameter
Optimization
Distributed
Training
Larger
Training
Datasets
Elastic
Model
Serving
Parallel
Experiments
(Commodity)
GPU Clusters
Auto
ML
14/38

THEY WILL
IGNORE YOU, DISTRIBUTION
UNTIL THEY
NEED YOU DISTRIBUTION

Top 10 List: Why Big Data/Distribution sucks
1. I don’t know which is scarier: reading Stephen King’s horror
novel IT or reading a file from HDFS in Python
2. MapReduce – how can a programming model with only 2
parts have O(N2) complexity?
3. I managed to install Hadoop – at the cost of my last
relationship…the Spark died!
4. “Jupyter’s down again” - some service you never heard of
stopped it working….
5. I still don’t know which is worse - a dead service or a dead
slow service.
16/38

Top 10 List: why Big Data/Distribution sucks
6. How do I debug this thing on my laptop?
7. Enough with the Dockerfiles and YAML files, when can I
start writing *real* code?
8. Isn’t it supposed to be faster when I add a new server?
9. Do I really have to install the library on *all* of the
servers?
10.Distributed? I don’t even do multi-threaded!
17/38

KubeFlow
GPU/TPU Services in the Cloud and On-Premise
•Managed Cloud Platforms
-RiseML
-FloydHub
-Google Cloud ML
-Microsoft Batch AI
-AWS Sagemaker
•On-Premise/Cloud Platforms
-KubeFlow
-Hops
18/38

Hops Open-Source Data Science Platform
19/38

PySpark
End-to-End ML Pipeline in Python on Hops
20/38
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
TfServingTensorFlow

FLIP-O-RAMA with Python and HDFS/Hops
Left Hand Here
Right
Thumb
Here
Data
Collection
Feature
Extraction
Data
Transformation
& Verification
Test

import hops.hdfs as hdfs
features = ["Age", "Occupation", "Sex", …, "Country"]
with open(hdfs.project_path() +
"/TestJob/data/census/adult.data", "r") as trainFile:
train_data=pd.read_csv(trainFile, names=features,
sep=r's*,s*', engine='python', na_values="?")
Data Ingestion with Pandas
Right
Thumb
Here

import hops.hdfs as hdfs
features = ["Age", "Occupation", "Sex", …, "Country"]
h = hdfs.get_fs()
with h.open_file(hdfs.project_path() +
"/TestJob/data/census/adult.data", "r") as trainFile:
train_data=pd.read_csv(trainFile, names=features,
sep=r's*,s*', engine='python', na_values="?")
Data Ingestion with Pandas on Hops
Right
Thumb
Here

Google Facets
24/38
Data
Collection
Feature
Extraction
Data
Transformation
& Verification
Test

train_data = pd.read_csv(…)
test_data = pd.read_csv(…)
facets.overview(train_data, test_data)
facets.dive(test_data.to_json(orient='records'))
Google Facets
25/38

Big Data Preparation with PySpark
images = spark.readImages(IMAGE_PATH, recursive = True,
numPartitions=10, sampleRatio = 0.1).cache()
tr = (ImageTransformer().setOutputCol(“transformed”)
.resize(height = 200, width = 200)
.crop(0, 0, height = 180, width = 180) )
smallImages = tr.transform(images).select(“transformed”)
dfutil.saveAsTFRecords(smallImages, OUTPUT_DIR)
26/38

Experimentation and Training
Data
Collection
Feature
Extraction
Data
Transformation
& Verification
Test

GridSearch with TensorFlow/Spark on Hops
def train(learning_rate, dropout):
[TensorFlow Code here]
args_dict = {'learning_rate': [0.001, 0.005, 0.01],
'dropout': [0.5, 0.6]}
experiment.launch(spark, train, args_dict)
Launch 6 Spark Executors
28/38

Model Architecture Search on TensorFlow/Hops
def train_cifar10(learning_rate, dropout, num_layers):
dict = {'learning_rate': [0.005, 0.00005],
'dropout': [0.01, 0.99], 'num_layers': [1,3]}
experiment.evolutionary_search(spark, train, dict,
direction='max', popsize=10, generations=3, crossover=0.7,
mutation=0.5)
29/38
Launch 10 Spark Executors

Differential Evolution in Tensorboard (1/4)
CIFAR-10
30/38

CIFAR-10
31/38

CIFAR-10
32/38

CIFAR-10
33/38

Distributed Training with Horovod
hvd.init()
opt = hvd.DistributedOptimizer(opt)
if hvd.local_rank()==0:
…..
else:
…..
34/38

Model Serving: Monitor + Retrain
35/38

Summary
•The future of Deep Learning is Distributed
https://www.oreilly.com/ideas/distributed-tensorflow
•Hops is a new Data Platform with first-class support for
Python / Deep Learning / ML / Data Governance / GPUs
*https://twitter.com/karpathy/status/972701240017633281
“It is starting to look like deep learning workflows of the future
feature autotuned architectures running with autotuned
compute schedules across arbitrary backends.”
Andrej Karpathy - Head of AI @ Tesla
36/38

The Team
Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud
Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios
Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, August
Bonds.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al
Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka,
Tobias Johansson , Roberto Bampi, Roshan Sedar.
www.hops.io
@hopshadoop

All AI Roads lead to Distribution - Dot AI

More Related Content

Similar to All AI Roads lead to Distribution - Dot AI

More from Jim Dowling

Recently uploaded

All AI Roads lead to Distribution - Dot AI

Editor's Notes