www.univa.com
Presenter:
Ian Lumb
Machine Learning for Big Data
Analytics:
Scaling In with Containers while
Scaling Out on Clusters
Watch On Demand Anytime
Note: Includes demos
2
Agenda
 Introduction
 Use case example
 Scaling …
 Out with Apache Spark via Univa Universal Resource Broker
 Up with NVIDIA GPUs and Univa Grid Engine
 In/Down with Univa container solutions
 Summary
www.univa.com
Introduction
Machine Learning Defined
4
“A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P if its
performance at tasks in T, as measured by
P, improves with experience E”.
T. M. Mitchell et al., Machine Learning, WCB, 1997
Deep Learning Defined
5
“… a modern refinement of ‘machine
learning’, in which computers teach
themselves tasks by crunching large sets
of data”.
http://www.economist.com/news/briefing/21650526-
artificialintelligence-scares-peopleexcessively-so-rise-machines
www.univa.com
Use Case Example:
Earthquakes and
Tsunamis
Use Case: Context
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1
_Twitter_Tsunami.pdf
Use Case: Motivation
 Non-deterministic cause
 Uncertainty inherent in any attempt to predict earthquakes
o In situ measurements may reduce uncertainty
 Lead times
 Availability of actionable observations
 Communication of situation - advisories, warnings, etc.
 Cause-effect relationship
 Energy transfer - inputs ... coupling ... outputs
o ‘Geometry’ - bathymetry and topography
 Other factors - e.g., tides
 Established effect
 Far-field estimates of tsunami propagation (pre-computed) and coastal
inundation (real-time) have proven to be extremely accurate ...
requires
– Distributed array of deep-ocean tsunami detection buoys + forecasting model
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_
Twitter_Tsunami.pdf
http://www.gitews.org/en/concept/
Use Case: Traditional Data Sources
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_
Twitter_Tsunami.pdf
Use Case: Deep Learning from Twitter?
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_
Twitter_Tsunami.pdf
Karau et al., Learning Spark, O’Reilly, 2015
Use Case: Machine Learning Pipeline
Use Case: Deep Learning from Twitter?
Represent data
 Twitter data manually curated into ‘ham’ and ‘spam’
 In-memory representation via Spark RDDs
Extract features
 Frequency-based usage via Spark MLlib HashingTF
⇒ feature vectors
Develop model object
 Spark MLlib LogisticRegressionWithSGD used for
classification
Evaluate model
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_
Twitter_Tsunami.pdf
Use Case: Laptop Prototype
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_
Twitter_Tsunami.pdf
Use Case: Next Steps …
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Next Steps: Scaling …
15
OUTIN
DOWN
UP
www.univa.com
Apache Spark via
Univa Universal
Resource Broker
Machine Learning via Apache Spark
17
http://img.deusm.com/informationweek/2015/03/1319660/Spark-2015-Vision.jpg
URB: Product Overview
18
URB extends Univa Grid Engine to handle Service and Custom
distributed applications in a Univa Grid Engine Cluster.
An API for developing distributed applications
 Compatible with Apache Mesos API
 Bindings for Python, Java, and C++
A runtime environment for hosting distributed applications
 Supports frameworks developed against the Mesos API
 Supports frameworks developed against the URB API
 Uses Univa Grid Engine to place and run work
What is Universal Resource Broker (URB)?
www.univa.com
URB: Architecture Overview
19
Spark Framework Running Thunder
www.univa.com
www.univa.com
Copyright © Univa Corporation, 2015. All Rights Reserved 20
URB: Web User Interface
HPC & Spark Workloads Together
21
URB: Solution Summary
t
22
Universal Resource Broker
 For the end user there is no change in application workflow
 For the admins there is increased control and policy capability over
compute resources
 The solution provides the ability to share resources across big data and
traditional batch workloads
 Single resource allocation policy defined by business goals
 Single accounting repository to track resource consumption
 Full workload lifecycle management for heterogeneous workloads
www.univa.com
www.univa.com
GPUs
GPUs for Deep Learning
24
http://image.slidesharecdn.com/nvidiateslap100-160621104058/95/announcing-the-nvidia-tesla-p100-gpu-for-pcie-
servers-9-638.jpg?cb=1466505803
 Post installation check:
 qhost -F <hostname>
hl:cuda.verstr=270.41.06
hl:cuda.0.name=GeForce 8400 GS
hl:cuda.0.totalMem=511.312M
hl:cuda.0.freeMem=500.480M
hl:cuda.0.usedMem=10.832M
hl:cuda.0.eccEnabled=0
hl:cuda.0.temperature=44.000000
hl:cuda.1.name=GeForce 8400 GS
hl:cuda.1.totalMem=511.312M
hl:cuda.1.freeMem=406.066M
hl:cuda.1.usedMem=20.274M
hl:cuda.1.eccEnabled=0
hl:cuda.1.temperature=43.000000
hl:cuda.devices=2
CUDA LOAD SENSOR
Copyright © 2016 Univa Corporation, All Rights Reserved. 25
• CUDA complexes can be used for:
• Setting alarm state of a host based on ECC errors
(load_threshold in queue config)
• Sorting hosts (load_formula)
• Job submission
• Requesting a host with GPUs
• qsub -l cuda.devices=2 ...
• Complex can be made consumable (complex
configuration) in order to limit amount of CUDA jobs per
host
GPU JOB SUBMISSION
Copyright © 2016 Univa Corporation, All Rights Reserved. 26
Host A
10
Host B
10
Host N
10
UGE Cluster
...
Job
124
A
B
C
D
E
e.g. GPUs
(IDs 0 & 1)
e.g. scratch
storage A-E
Job
123
 Two host resources: 0, 1
 Five global resources: A, B, C, D, E
 Job 123 got assigned ID 0 of GPU resource on
host N and resource C of global resource
scratch
 Job 124 got assigned ID 1 of GPU resource on
host B and resource E of global resource
scratch
RESOURCE MAPS
Copyright © 2016 Univa Corporation, All Rights Reserved. 27
www.univa.com
Containers
www.univa.co
m
Containerized PySpark Example
29
www.univa.co
m 30
Univa Grid Engine – Container Edition (1)
 Launch Docker Container on best machine in cluster
 Reduces time wasted (it can be minutes … or longer)
o Attempting to launch on an improperly serviced execution host.
o Waiting for the Docker image to download from the Docker registry.
 Ensures container runs faster increasing throughput in the cluster.
 Run Docker Containers in a Univa Grid Engine Cluster
 Business Critical containers are prioritized over other containers.
Increases efficiency of the overall organization.
 Containers can be orchestrated alongside other critical workloads such
as batch jobs and frameworks.
$ qsub -o /home/jdoe -j y -xdv "/home:/home"
-l docker,docker_images="*centos:latest*“ my_job.sh
www.univa.co
m 31
Univa Grid Engine – Container Edition (2)
 Job Control and Limits for Docker Containers
 Provides user and administrator control over containers running on Grid
Engine Hosts.
 Accounting for Docker Containers
 Keeps track of containers. Share policies require accounting.
 Data file Management for Docker Containers
 Transparent access to input, output and error files. Simplifies the
management of input and output files for Docker Containers and
ensures any output or error files are moved to a location where the user
can access them.
 Interactive Docker Containers
 Good for debugging when containers don’t work correctly!
 Parallel jobs in Docker Containers
 Message-passing parallel jobs can each run a set of tasks in a container
on a machine.
Containerized GPUs
32
https://github.com/NVIDIA/nvidia-docker
Univa Confidential
Navops by Univa
Easy installation, preconfigured solution including pre-integration
with cloud services.
Build a container cluster on premise or in the cloud.
The fastest way to build a container cluster!!
Respond Quickly: Easy to resize, adapt, dynamic provisioning
Orchestrate and Optimize: Best use of resources and keep track of
containers
The most advanced container orchestration!!
http://navops.io/
Univa Confidential 34
Navops orchestration
solution
35
Summary
 Scaling Machine Learning from prototype to production …
 Out with Apache Spark via Univa Universal Resource Broker
 Up with NVIDIA GPUs via Univa Grid Engine
 In/Down via Univa Container solutions
o Univa Grid Engine – Container Edition
o Navops Launch and Command
www.univa.com
THANK YOU
Ian Lumb
Solutions Architect
+1 630 303-9068 ilumb@univa.com
Watch On Demand Anytime
Note: Includes demos

Machine Learning for Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters

  • 1.
    www.univa.com Presenter: Ian Lumb Machine Learningfor Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters Watch On Demand Anytime Note: Includes demos
  • 2.
    2 Agenda  Introduction  Usecase example  Scaling …  Out with Apache Spark via Univa Universal Resource Broker  Up with NVIDIA GPUs and Univa Grid Engine  In/Down with Univa container solutions  Summary
  • 3.
  • 4.
    Machine Learning Defined 4 “Acomputer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E”. T. M. Mitchell et al., Machine Learning, WCB, 1997
  • 5.
    Deep Learning Defined 5 “…a modern refinement of ‘machine learning’, in which computers teach themselves tasks by crunching large sets of data”. http://www.economist.com/news/briefing/21650526- artificialintelligence-scares-peopleexcessively-so-rise-machines
  • 6.
  • 7.
  • 8.
    Use Case: Motivation Non-deterministic cause  Uncertainty inherent in any attempt to predict earthquakes o In situ measurements may reduce uncertainty  Lead times  Availability of actionable observations  Communication of situation - advisories, warnings, etc.  Cause-effect relationship  Energy transfer - inputs ... coupling ... outputs o ‘Geometry’ - bathymetry and topography  Other factors - e.g., tides  Established effect  Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time) have proven to be extremely accurate ... requires – Distributed array of deep-ocean tsunami detection buoys + forecasting model http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_ Twitter_Tsunami.pdf
  • 9.
    http://www.gitews.org/en/concept/ Use Case: TraditionalData Sources http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_ Twitter_Tsunami.pdf
  • 10.
    Use Case: DeepLearning from Twitter? http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_ Twitter_Tsunami.pdf
  • 11.
    Karau et al.,Learning Spark, O’Reilly, 2015 Use Case: Machine Learning Pipeline
  • 12.
    Use Case: DeepLearning from Twitter? Represent data  Twitter data manually curated into ‘ham’ and ‘spam’  In-memory representation via Spark RDDs Extract features  Frequency-based usage via Spark MLlib HashingTF ⇒ feature vectors Develop model object  Spark MLlib LogisticRegressionWithSGD used for classification Evaluate model http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_ Twitter_Tsunami.pdf
  • 13.
    Use Case: LaptopPrototype http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_ Twitter_Tsunami.pdf
  • 14.
    Use Case: NextSteps … http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
  • 15.
    Next Steps: Scaling… 15 OUTIN DOWN UP
  • 16.
    www.univa.com Apache Spark via UnivaUniversal Resource Broker
  • 17.
    Machine Learning viaApache Spark 17 http://img.deusm.com/informationweek/2015/03/1319660/Spark-2015-Vision.jpg
  • 18.
    URB: Product Overview 18 URBextends Univa Grid Engine to handle Service and Custom distributed applications in a Univa Grid Engine Cluster. An API for developing distributed applications  Compatible with Apache Mesos API  Bindings for Python, Java, and C++ A runtime environment for hosting distributed applications  Supports frameworks developed against the Mesos API  Supports frameworks developed against the URB API  Uses Univa Grid Engine to place and run work What is Universal Resource Broker (URB)? www.univa.com
  • 19.
    URB: Architecture Overview 19 SparkFramework Running Thunder www.univa.com
  • 20.
    www.univa.com Copyright © UnivaCorporation, 2015. All Rights Reserved 20 URB: Web User Interface
  • 21.
    HPC & SparkWorkloads Together 21
  • 22.
    URB: Solution Summary t 22 UniversalResource Broker  For the end user there is no change in application workflow  For the admins there is increased control and policy capability over compute resources  The solution provides the ability to share resources across big data and traditional batch workloads  Single resource allocation policy defined by business goals  Single accounting repository to track resource consumption  Full workload lifecycle management for heterogeneous workloads www.univa.com
  • 23.
  • 24.
    GPUs for DeepLearning 24 http://image.slidesharecdn.com/nvidiateslap100-160621104058/95/announcing-the-nvidia-tesla-p100-gpu-for-pcie- servers-9-638.jpg?cb=1466505803
  • 25.
     Post installationcheck:  qhost -F <hostname> hl:cuda.verstr=270.41.06 hl:cuda.0.name=GeForce 8400 GS hl:cuda.0.totalMem=511.312M hl:cuda.0.freeMem=500.480M hl:cuda.0.usedMem=10.832M hl:cuda.0.eccEnabled=0 hl:cuda.0.temperature=44.000000 hl:cuda.1.name=GeForce 8400 GS hl:cuda.1.totalMem=511.312M hl:cuda.1.freeMem=406.066M hl:cuda.1.usedMem=20.274M hl:cuda.1.eccEnabled=0 hl:cuda.1.temperature=43.000000 hl:cuda.devices=2 CUDA LOAD SENSOR Copyright © 2016 Univa Corporation, All Rights Reserved. 25
  • 26.
    • CUDA complexescan be used for: • Setting alarm state of a host based on ECC errors (load_threshold in queue config) • Sorting hosts (load_formula) • Job submission • Requesting a host with GPUs • qsub -l cuda.devices=2 ... • Complex can be made consumable (complex configuration) in order to limit amount of CUDA jobs per host GPU JOB SUBMISSION Copyright © 2016 Univa Corporation, All Rights Reserved. 26
  • 27.
    Host A 10 Host B 10 HostN 10 UGE Cluster ... Job 124 A B C D E e.g. GPUs (IDs 0 & 1) e.g. scratch storage A-E Job 123  Two host resources: 0, 1  Five global resources: A, B, C, D, E  Job 123 got assigned ID 0 of GPU resource on host N and resource C of global resource scratch  Job 124 got assigned ID 1 of GPU resource on host B and resource E of global resource scratch RESOURCE MAPS Copyright © 2016 Univa Corporation, All Rights Reserved. 27
  • 28.
  • 29.
  • 30.
    www.univa.co m 30 Univa GridEngine – Container Edition (1)  Launch Docker Container on best machine in cluster  Reduces time wasted (it can be minutes … or longer) o Attempting to launch on an improperly serviced execution host. o Waiting for the Docker image to download from the Docker registry.  Ensures container runs faster increasing throughput in the cluster.  Run Docker Containers in a Univa Grid Engine Cluster  Business Critical containers are prioritized over other containers. Increases efficiency of the overall organization.  Containers can be orchestrated alongside other critical workloads such as batch jobs and frameworks. $ qsub -o /home/jdoe -j y -xdv "/home:/home" -l docker,docker_images="*centos:latest*“ my_job.sh
  • 31.
    www.univa.co m 31 Univa GridEngine – Container Edition (2)  Job Control and Limits for Docker Containers  Provides user and administrator control over containers running on Grid Engine Hosts.  Accounting for Docker Containers  Keeps track of containers. Share policies require accounting.  Data file Management for Docker Containers  Transparent access to input, output and error files. Simplifies the management of input and output files for Docker Containers and ensures any output or error files are moved to a location where the user can access them.  Interactive Docker Containers  Good for debugging when containers don’t work correctly!  Parallel jobs in Docker Containers  Message-passing parallel jobs can each run a set of tasks in a container on a machine.
  • 32.
  • 33.
    Univa Confidential Navops byUniva Easy installation, preconfigured solution including pre-integration with cloud services. Build a container cluster on premise or in the cloud. The fastest way to build a container cluster!! Respond Quickly: Easy to resize, adapt, dynamic provisioning Orchestrate and Optimize: Best use of resources and keep track of containers The most advanced container orchestration!! http://navops.io/
  • 34.
    Univa Confidential 34 Navopsorchestration solution
  • 35.
    35 Summary  Scaling MachineLearning from prototype to production …  Out with Apache Spark via Univa Universal Resource Broker  Up with NVIDIA GPUs via Univa Grid Engine  In/Down via Univa Container solutions o Univa Grid Engine – Container Edition o Navops Launch and Command
  • 36.
    www.univa.com THANK YOU Ian Lumb SolutionsArchitect +1 630 303-9068 ilumb@univa.com Watch On Demand Anytime Note: Includes demos