Spark and Deep Learning Frameworks at Scale 7.19.18

SPARK AND DEEP LEARNING FRAMEWORKS AT
SCALE
Vartika Singh

OBJECTIVE
• Enabling Machine Learning in field
• Enablement and use case discovery
• Data and ML: what do we focus on?
• Typical data ingest architecture
• Extending Spark
• Deep Learning - how does the fit in?
• Hardware
Objective

DATA - MARKET PROPOSITION
Click Stream Smart clicks, impression and
conversions
Videos Fraud, navigation, ad placement
Medical Data Tumor detection, patient mortality,
anomaly identification
City data Planning, Resource distribution
Wafer, Oil and gas data Pipeline optimization, fault detection
?? ...

Ref: https://hbr.org/2017/05/whats-your-data-strategy
• Less than half of an organization’s structured data is actively used in making decisions
• Less than 1% of it’s unstructured data is analyzed or used at all
• More than 70% of employees have access to data they should not
• 80% of analysts time is spent simply discovering and preparing data
• Data breaches are common
• Rogue data sets propagate in silos
• Companies’ data technology often is not up to the demands put on it

Use case
discovery
Model Serving
Hidden feedback
loops
Undeclared
consumer
dependencies
Change in the
external world
Ref: Hidden Technical Debt in Machine Learning ... - NIPS Proceedings

Is evolving Science
We are not very good at anticipating what the next emerging serious flaw will
be.
What we’re missing is an engineering discipline with its principles of analysis
and design.
Keep It Simple Stupid!
https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7

Data
Processes
ML
● Deconstruct the problem.
● Democratize
● Paved Pathways

© Cloudera, Inc. All rights reserved.
INTELLIGENT INFRASTRUCTURE!!!

CLOUDERA DATA SCIENCE WORKBENCH

OVERVIEW - PROJECTS

OVERVIEW - GPUS

OVERVIEW - WEBUIS

OVERVIEW - DISTRIBUTED COMPUTING WITH WORKERS

OTHER FEATURES
• Git
• S3/HDFS

• Create a snapshot of model code,
dependencies, and configuration
necessary to train the model.
• Build and execute the training run
in an isolate container.
• Track specified model metrics,
performance, and model artifacts.
• Inspect, compare , or deploy prior
models.
EXPERIMENTS

MODELS

• In model parallelism, different machines in
the distributed system are responsible for
the computations in different parts of a
single network - for example, each layer in
the neural network may be assigned to a
different machine.

• In data parallelism, different machines have
a complete copy of the model; each machine
simply gets a different portion of the data, and
results from each are somehow combined.

SPARK AND JNI
• OpenCV
• Tesseract
• Common Implementations using JavaCPP
Ref: https://github.com/bytedeco/javacpp

SPARK/HPC WORKLOADS
Gene Sequencing/ Assembling/ Analysis
• Data parallelism and statistical methods lie at the core of all DNA sequencing
workloads.
• Sequencing - Base calling
• Variant calling
• GATK - Can run on Spark
• Canu - Transform to PySpark workload using Python C extensions
• Analysis - HAIL
Ref: https://software.broadinstitute.org/gatk/
Ref: https://hail.is/
Ref: https://blog.cloudera.com/blog/2017/05/hail-scalable-genomics-analysis-with-spark/

HPC WORKLOADS
• Portions of the Hadoop ecosystem can open your grid to more users.
• PySpark allows a company that is using a legacy C++ grid to re-use their C++ library assets
with very little to no changes. Python to C++ bindings result in minimal performance penalties.
• Cloudera Data Science Workbench (CDSW) allow Data Scientists to rapidly develop and
visualize models with more involvement from the business.
• In infrastructures with direct attached storage, Hadoop’s locality based processing allows for
fast efficient movement of data between storage and compute.
• Deploying Hadoop on a portion or on all of your grid allows you to use the same tools on the
grid that you would use on a Cloud Based Hadoop Cluster.

DEEP LEARNING IN BIG DATA
• A major source of difficulty in many real-
world artificial intelligence applications is
that many of the factors of variation
influence every single piece of data we can
observe.
• Deep learning solves this central problem
via representation learning by introducing
representations that are expressed in terms
of other, simpler representations.

BIOINFORMATICS
• Protein Structure
• Gene Expression Regulation
• Protein Classification
• Anomaly Classification
• Segmentation

BIOINFORMATICS: THE NATURE OF DATA
• Complex and expensive data acquisition processes limit the size of
bioinformatics datasets.
• Significantly unequal class distributions
• In clinical or disease-related cases, there is inevitably less data from treatment groups than
from the normal (control) group.
• Visualization
• Multimodal Deep Learning

IOT
• A time series is a sequence of regular time-ordered observations
• Example: stock prices, weather readings, smartphone sensor data
• Challenges
• Large scale streaming data
• Heterogeneity
• Time and space correlation
• High noise data
• NRT decision on multimodal data

IOT DEVICES
• Network compression
• Convert to sparse network
• Not general enough
• Factors to consider
• Running time
• Energy consumption
• Architectural considerations
• FFL are much faster than convolution layers in CNN
• Activation functions (ReLu are more time-efficient than Tanh than Sigmoid)
• CNNs use less storage than DNNs due to fewer stored parameters in convolutional layers
• Accelerators
• Tinymotes
• Fog Computing

NLP
• Word Embeddings: GloVe, Word2Vec
• RNN -> LSTMs -> Attention Mechanism
• Applications
• Sentiment analysis
• Gene sequencing
• Natural language generation

DEEP LEARNING - THE HYPERPARAMETERS
• Architecture
• How many layers
• How many nodes/filters
• Which type
• Data
• Batches size
• Size of filters
• Number of steps the
memory of cells will learn
• Training
• Regularization
• Learning rate
• Gradient expressions
• Init policy

TRANSFER LEARNING

TRANSFER LEARNING
• Deep neural networks trained on natural images exhibit a curious phenomenon
in common:
• In the first layer they learn features similar to Gabor filters and color blobs.
• Such first-layer features appear not to be specific to a particular dataset or task, but general in
that they are applicable to many datasets and tasks.
• Initializing a network with transferred features from almost any number of layers
can produce a boost to generalization that lingers even after fine-tuning to the
target dataset.
• The effectiveness of feature transfer is expected to decline as the base and
target tasks become less similar.

SPARK DEEP LEARNING PIPELINES
• Transfer learning
• Distributed hyperparameter tuning
• Deploying models in SQL

DISTRIBUTED TRAINING - WHEN TO DO IT
• Distributed training isn’t free
• Setup time
• Continue to train your networks on a single machine, until the training time
becomes prohibitive

OPERATIONAL IMPLICATIONS
• Model exploration using small data
• Computational limits
• Irreducible errors
• Predictable

• Neurons and Synapses
• Computed weighted sum for
each layer
• Compute the gradient of the loss
relative to the filter inputs
• Compute the gradient of the loss
relative to the weights
M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep Learning for IoT Big Data and Streaming Analytics: A Survey,” arXiv preprint arXiv:1712.04301v1 [cs.NI], 2017.
DNN

DEEP LEARNING AT SCALE
• Backpropagation requires intermediate outputs of the network to be preserved
for the backwards computation, thus training has increased storage
requirements.
• Second, due to the gradients use for hill-climbing, the precision requirement for
training is generally higher than inference.

DEEP LEARNING AT SCALE
• A significant amount of effort has been put into developing deep learning
systems that can scale to very large models and large training sets
• Large models in the literature are now top performers in supervised visual
recognition tasks
• Can even learn to detect objects when trained from unlabeled images alone
• The very largest of these systems are able to train neural networks with over 1
billion trainable parameters

HARDWARE FOR DNN
• Intel Knights Landing CPU features special vector instructions for deep learning
• Nvidia PASCAL GP100 GPU features 16-bit floating point (FP16) arithmetic
support to perform two FP16 operations on a single precision core for faster
deep learning computation
• Systems have also been built specifically for DNN processing such as Nvidia
DGX-1 and Facebook’s Big Basin custom DNN server
• DNN inference has also been demonstrated on various embedded System-on-
Chips (SoC) such as Nvidia Tegra and Samsung Exynos as well as FPGAs

GPU SUPPORT IN YARN
• As of now, only Nvidia GPUs are supported by YARN
• YARN node managers have to be pre-installed with Nvidia drivers.
• When Docker is used as container runtime context, nvidia-docker 1.0 needs to
be installed (Current supported version in YARN for nvidia-docker).
• https://issues.apache.org/jira/browse/YARN-3926
• https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-
site/UsingGpus.html

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Efficient Processing of Deep Neural Networks: A Tutorial and Survey

ACCELERATORS FOR TEMPORAL ARCHITECTURES
• The downside for using matrix multiplication for the CONV layers is that there is
redundant data in the input feature map matrix, which can lead to either
inefficiency in storage, or a complex memory access pattern
• There are software libraries designed for CPUs (e.g., Open- BLAS, Intel MKL,
etc.) and GPUs (e.g., cuBLAS, cuDNN, etc.) that optimize for matrix
multiplications
• The matrix multiplications on these platforms can be further sped up by
applying computational transforms to the data to reduce the number of
multiplications

ACCELERATORS FOR SPATIAL ARCHITECTURES
• For DNNs, the bottleneck for processing is in the
memory access
• Accelerators, such as spatial architectures,
provide an opportunity to reduce the energy cost
of data movement by introducing several levels
of local memory hierarchy with different energy
cost
• The multiple levels of memory hierarchy help to
improve energy efficiency by providing low-cost
data accesses

1) How do you
collect your data?
2) Where do your
data scientists play?
3) Let’s talk to
the business

Spark and Deep Learning Frameworks at Scale 7.19.18

More Related Content

What's hot

Similar to Spark and Deep Learning Frameworks at Scale 7.19.18

More from Cloudera, Inc.

Recently uploaded

Spark and Deep Learning Frameworks at Scale 7.19.18