Machine Learning for Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters

www.univa.com
Presenter:
Ian Lumb
Machine Learning for Big Data
Analytics:
Scaling In with Containers while
Scaling Out on Clusters
Watch On Demand Anytime
Note: Includes demos

2
Agenda
 Introduction
 Use case example
 Scaling …
 Out with Apache Spark via Univa Universal Resource Broker
 Up with NVIDIA GPUs and Univa Grid Engine
 In/Down with Univa container solutions
 Summary

Machine Learning Defined
4
“A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P if its
performance at tasks in T, as measured by
P, improves with experience E”.
T. M. Mitchell et al., Machine Learning, WCB, 1997

Deep Learning Defined
5
“… a modern refinement of ‘machine
learning’, in which computers teach
themselves tasks by crunching large sets
of data”.
http://www.economist.com/news/briefing/21650526-
artificialintelligence-scares-peopleexcessively-so-rise-machines

www.univa.com
Use Case Example:
Earthquakes and
Tsunamis

Use Case: Context
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1
_Twitter_Tsunami.pdf

Use Case: Motivation
 Non-deterministic cause
 Uncertainty inherent in any attempt to predict earthquakes
o In situ measurements may reduce uncertainty
 Lead times
 Availability of actionable observations
 Communication of situation - advisories, warnings, etc.
 Cause-effect relationship
 Energy transfer - inputs ... coupling ... outputs
o ‘Geometry’ - bathymetry and topography
 Other factors - e.g., tides
 Established effect
 Far-field estimates of tsunami propagation (pre-computed) and coastal
inundation (real-time) have proven to be extremely accurate ...
requires
– Distributed array of deep-ocean tsunami detection buoys + forecasting model
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_
Twitter_Tsunami.pdf

http://www.gitews.org/en/concept/
Use Case: Traditional Data Sources
Twitter_Tsunami.pdf

Use Case: Deep Learning from Twitter?
Twitter_Tsunami.pdf

Karau et al., Learning Spark, O’Reilly, 2015
Use Case: Machine Learning Pipeline

Use Case: Deep Learning from Twitter?
Represent data
 Twitter data manually curated into ‘ham’ and ‘spam’
 In-memory representation via Spark RDDs
Extract features
 Frequency-based usage via Spark MLlib HashingTF
⇒ feature vectors
Develop model object
 Spark MLlib LogisticRegressionWithSGD used for
classification
Evaluate model
Twitter_Tsunami.pdf

Use Case: Laptop Prototype
Twitter_Tsunami.pdf

Use Case: Next Steps …
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf

Next Steps: Scaling …
15
OUTIN
DOWN
UP

www.univa.com
Apache Spark via
Univa Universal
Resource Broker

Machine Learning via Apache Spark
17
http://img.deusm.com/informationweek/2015/03/1319660/Spark-2015-Vision.jpg

URB: Product Overview
18
URB extends Univa Grid Engine to handle Service and Custom
distributed applications in a Univa Grid Engine Cluster.
An API for developing distributed applications
 Compatible with Apache Mesos API
 Bindings for Python, Java, and C++
A runtime environment for hosting distributed applications
 Supports frameworks developed against the Mesos API
 Supports frameworks developed against the URB API
 Uses Univa Grid Engine to place and run work
What is Universal Resource Broker (URB)?
www.univa.com

URB: Architecture Overview
19
Spark Framework Running Thunder
www.univa.com

www.univa.com
Copyright © Univa Corporation, 2015. All Rights Reserved 20
URB: Web User Interface

HPC & Spark Workloads Together
21

URB: Solution Summary
t
22
Universal Resource Broker
 For the end user there is no change in application workflow
 For the admins there is increased control and policy capability over
compute resources
 The solution provides the ability to share resources across big data and
traditional batch workloads
 Single resource allocation policy defined by business goals
 Single accounting repository to track resource consumption
 Full workload lifecycle management for heterogeneous workloads
www.univa.com

GPUs for Deep Learning
24
http://image.slidesharecdn.com/nvidiateslap100-160621104058/95/announcing-the-nvidia-tesla-p100-gpu-for-pcie-
servers-9-638.jpg?cb=1466505803

 Post installation check:
 qhost -F <hostname>
hl:cuda.verstr=270.41.06
hl:cuda.0.name=GeForce 8400 GS
hl:cuda.0.totalMem=511.312M
hl:cuda.0.freeMem=500.480M
hl:cuda.0.usedMem=10.832M
hl:cuda.0.eccEnabled=0
hl:cuda.0.temperature=44.000000
hl:cuda.1.name=GeForce 8400 GS
hl:cuda.1.totalMem=511.312M
hl:cuda.1.freeMem=406.066M
hl:cuda.1.usedMem=20.274M
hl:cuda.1.eccEnabled=0
hl:cuda.1.temperature=43.000000
hl:cuda.devices=2
CUDA LOAD SENSOR
Copyright © 2016 Univa Corporation, All Rights Reserved. 25

• CUDA complexes can be used for:
• Setting alarm state of a host based on ECC errors
(load_threshold in queue config)
• Sorting hosts (load_formula)
• Job submission
• Requesting a host with GPUs
• qsub -l cuda.devices=2 ...
• Complex can be made consumable (complex
configuration) in order to limit amount of CUDA jobs per
host
GPU JOB SUBMISSION

Host A
10
Host B
10
Host N
10
UGE Cluster
...
Job
124
A
B
C
D
E
e.g. GPUs
(IDs 0 & 1)
e.g. scratch
storage A-E
Job
123
 Two host resources: 0, 1
 Five global resources: A, B, C, D, E
 Job 123 got assigned ID 0 of GPU resource on
host N and resource C of global resource
scratch
 Job 124 got assigned ID 1 of GPU resource on
host B and resource E of global resource
scratch
RESOURCE MAPS

www.univa.co
m
Containerized PySpark Example
29

www.univa.co
m 30
Univa Grid Engine – Container Edition (1)
 Launch Docker Container on best machine in cluster
 Reduces time wasted (it can be minutes … or longer)
o Attempting to launch on an improperly serviced execution host.
o Waiting for the Docker image to download from the Docker registry.
 Ensures container runs faster increasing throughput in the cluster.
 Run Docker Containers in a Univa Grid Engine Cluster
 Business Critical containers are prioritized over other containers.
Increases efficiency of the overall organization.
 Containers can be orchestrated alongside other critical workloads such
as batch jobs and frameworks.
$ qsub -o /home/jdoe -j y -xdv "/home:/home"
-l docker,docker_images="*centos:latest*“ my_job.sh

www.univa.co
m 31
Univa Grid Engine – Container Edition (2)
 Job Control and Limits for Docker Containers
 Provides user and administrator control over containers running on Grid
Engine Hosts.
 Accounting for Docker Containers
 Keeps track of containers. Share policies require accounting.
 Data file Management for Docker Containers
 Transparent access to input, output and error files. Simplifies the
management of input and output files for Docker Containers and
ensures any output or error files are moved to a location where the user
can access them.
 Interactive Docker Containers
 Good for debugging when containers don’t work correctly!
 Parallel jobs in Docker Containers
 Message-passing parallel jobs can each run a set of tasks in a container
on a machine.

Containerized GPUs
32
https://github.com/NVIDIA/nvidia-docker

Univa Confidential
Navops by Univa
Easy installation, preconfigured solution including pre-integration
with cloud services.
Build a container cluster on premise or in the cloud.
The fastest way to build a container cluster!!
Respond Quickly: Easy to resize, adapt, dynamic provisioning
Orchestrate and Optimize: Best use of resources and keep track of
containers
The most advanced container orchestration!!
http://navops.io/

Univa Confidential 34
Navops orchestration
solution

35
Summary
 Scaling Machine Learning from prototype to production …
 Out with Apache Spark via Univa Universal Resource Broker
 Up with NVIDIA GPUs via Univa Grid Engine
 In/Down via Univa Container solutions
o Univa Grid Engine – Container Edition
o Navops Launch and Command

www.univa.com
THANK YOU
Ian Lumb
Solutions Architect
+1 630 303-9068 ilumb@univa.com
Watch On Demand Anytime
Note: Includes demos

Machine Learning for Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters

More Related Content

Viewers also liked

Similar to Machine Learning for Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters

More from Ian Lumb

Recently uploaded

Machine Learning for Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters