February 2019
Cutting Time, Complexity and Costs
from Data Science to Production
 Data science challenges
 Iguazio data science PaaS over Kubernetes
 NVIDIA solutions to accelerate data science with Kubernetes
o GPU integration, TensorRT, RAPIDS
 Hands on tutorial
o End-to-end application: real-time predictive infrastructure monitoring
(ingest, explore, hyper param training, deploy to production)
o Serverless and scale-out data science
o NVIDIA RAPIDS
 Summary
 Q&A
Agenda
Today: ML Lifecycle is Complex and Siloed
Data Prep & Analytics
Data Engineers
Model Building
Data Scientists
Model Deployment
ETL Data Lakes/
Warehouses
CSVs Model
Need more
fresh data
Tune model
Active Data
(CSV/in-mem)
GPU
Data Engineers and App Developers
ML Model
Serving
App Deployment
Interactive App
Stream Processing
Triggers and
InteractionsDatabase
4
ML Challenges in Real Life
Re-coding &
instrumenting
AI Model “Depth” & Accuracy
vs Performance & Costs
Observability &
Reproducibility
Infrastructure and
Software Complexity
Can we gather (and prep)
model features in production?
5
Solution: Fast & Continuous Data Science Pipeline
Collect
Constantly Ingest, Clean &
Tag Data via “Collectors”
Develop
“Serverless” Functions
& Notebooks
Deploy to Production
Triggers and
Interactions
Intelligent
Serverless
Run-Time
In Cloud, On-prem or Edge
Build & Test
CI/CD for Code
& Models
ML Model
Training
CPU GPU
Monitor & Reiterate
Deploy in Any
Cloud or Edge
Deliver Accurate
Results in Real-time
Develop and
Iterate Faster
6
Iguazio: Open & High-Performance Data-Science PaaS
Real-time Structured & Unstructured Data Fabric
External Data
Managed & hardened open-source
plus 3rd party services and apps
Secure real-time data sharing
enabling collaboration & parallelism
Self-service experience from A to Z
CPU GPU
Built on a cloud-native architecture
Compute
7
Develop Faster, Run Faster, Use Less Resources
Managed Jupyter
Data science notebooks and online IDE
 Serverless notebooks: self-service, scale to zero on idle
 Simplify, secure and accelerate data access and processing
 Accelerate applications and training using shared GPUs and ML services
 One-click deployment to production (as jobs, real-time functions and dashboards)
Time Series Stream Table Object
GPU
Historical and real-time data
from a variety of sources
Integrated, 3rd party or cloud
ML services on-demand
8
Deploy Faster to Production with Serverless
Nuclio: the leading open-source serverless for real-time intelligence
 Minimize software development and maintenance overhead
 Extreme performance (Up to 370K events/sec per process, 0.1 ms latency, fast data access)
 Open, supports many event/data sources - HTTP, streaming, messaging, jobs
 One-click deployment from many sources (code, containers, notebooks, git, templates)
Cloud, On-prem
or Edge
One-Click
Deployment
9
Kubernetes
Kubernetes Helps Simplify the Use of Clusters and GPUs
Think of Kubernetes as an operating
system for a cluster.
Kubernetes manages nodes, administer
access, launch containers, jobs and more
Container
Worker
Worker
Worker
Worker
C. C.
Container
Master
Server
API Server
Replication Controller
Scheduler
Daemon
Daemon
Daemon
Daemon
Infrastructure as code:
e.g. PyTorch Training Job
pytorch-job.yml
---
apiVersion: batch/v1
kind: Job
metadata:
name: pytorch-example
spec:
backoffLimit: 5
template:
spec:
imagePullSecrets:
- name: nvcr.dgxkey
containers:
- name: pytorch-container
image: nvcr.io/nvidia/pytorch:18.06-py3
command: ["/bin/sh"]
args: ["-c", "python /examples/mnist/main.py"]
resources:
limits:
nvidia.com/gpu: 1
9
10
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
Data
preparation
/ wrangling
cuDF
Optimized ML
model
training
cuML Visualization
Data
visualization
libraries
data insights
Re-Imagining Data Science Workflow
10
11
Software Stack Python
Data Preparation
cuDF
Visualization
cuGRAPH
Model Training
cuML
CUDA
PYTHON
APACHE ARROW on GPU Memory
DASK
DEEP
LEARNING
FRAMEWORKS
CUDNN
RAPIDS
CUMLCUDF CUGRAPH
Read/Write RAPIDS
dataframes Directly into
Iguzaio Database & FS
RAPIDS – GPU Accelerated Data Science
11
12
2,290
1,956
1,999
1,948
169
157
0 1,000 2,000 3,000
20 CPU
Nodes
30 CPU
Nodes
50 CPU
Nodes
100 CPU
Nodes
DGX-2
5x DGX-1
0 5,000 10,000
20 CPU
Nodes
30 CPU
Nodes
50 CPU
Nodes
100 CPU
Nodes
DGX-2
5x DGX-1
cuML — XGBoost
2,741
1,675
715
379
42
19
0 1,000 2,000 3,000
20 CPU
Nodes
30 CPU
Nodes
50 CPU
Nodes
100 CPU
Nodes
DGX-2
5x DGX-1
End-to-End
cuIO/cuDF —
Load and Data Preparation
Benchmark
200GB CSV dataset; Data preparation
includes joins, variable
transformations.
CPU Cluster Configuration
CPU nodes (61 GiB of memory, 8 vCPUs,
64-bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand network
Time in seconds — Shorter is better
cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost
Faster Speeds, Real World Benefits
12
13
TensorRT – GPU Powered Inference Server
Available with Monthly Updates
Models supported
● TensorFlow GraphDef/SavedModel
● TensorFlow and TensorRT GraphDef
● TensorRT Plans
● Caffe2 NetDef (ONNX import)
Multi-GPU support
Concurrent model execution
Server HTTP REST API/gRPC
Python/C++ client libraries
Python/C++ Client Library
13
Details: https://developer.nvidia.com/tensorrt
Time Series DB
NVIDIA TensorRT Over Kubernetes & Iguazio
Nuclio Function
(Serverless)
14
Demo Time !
15
16
 Eliminate complexity through pre-integrated managed services
 Leverage parallelism and hardware acceleration to improve ROI
 Consolidate data engineering, science and app dev platforms
 Focus on the end goal:
Build and Deploy Intelligent Apps Faster:
Summary
Production Deployment of Intelligent Applications
Q&A
17
info@iguazio.com | www.iguazio.com
Thank You
19
 Many APIs and models on the same data
o SQL, NoSQL, time series, stream, files
o Custom APIs, streaming, sync and ETLs
 Minimize CPU, mem, and ops overhead
Iguazio Smart Unified Real-time DB & File-System
100TB NVMe Flash
(direct attached)
High-Speed Fabric
Real-time Firewall
Smart Real-time DB
Many standard &
open APIs on a
unified DB Engine
Use NVMe Flash
as an extension
of memory
Granular
security
S3
ETL Streams
 In-memory performance, at 1/30 of the
cost and 30x the density (on Flash)
 Real-time time series & data analytics
 Fine-grained security
Apps & Users Backup
Real-time Intelligent Infrastructure Management
Auto-Healing Network Operations
 Replaced a complex Hadoop based data
pipeline that was never productized
 Cross correlating real-time data from
multiple sources with historical data
 AI-based predictions trigger pre-
programmed actions that fix evolving
problems in the network
 Implemented within weeks of initial
deployment
Singtel uses Iguazio to predict network outages and avoid them in real-time
Singtel’s self-healing network is the perfect example of a client shifting from
reactive to proactive with Iguazio
20
21
Real-time Intelligent Infrastructure Management
Maintaining Continuous Fast Response for 2nd Tier Cloud Services
Analyzing and predicting cloud service response time for optimal results
Real-time Data Ingestion
From multiple monitoring tools including Jennifer and Zabbix
Anomaly Detection
Accurate anomaly detection with order of magnitude lower
false positives as opposed to the previous Elasticsearch based
platform
Root Cause Analysis
Real-time root cause analysis from multiple factors. For
example, correlating servers’ CPU’s and applications response
time changes occurring simultaneously
Predictive Analytics
Predicting response times and sending real-time alerts
indicating which factors need to be adjusted to avoid
malfunctions
From deployment to completion in less than two weeks!
22
Evolve Into an Agile Cloud-Native Architecture
YARN
HbaseHDFS
Map
Reduce
Pig,
Hive, ..
DBaaS
S3 (object)
From a Legacy & Resource
Intensive Architecture To Simpler & Modern Approach
Data
Orchestration
Middleware
Your Business Logic
Consume
Innovate
Serverless Data-Science BigData

Webinar: Cutting Time, Complexity and Cost from Data Science to Production

  • 1.
    February 2019 Cutting Time,Complexity and Costs from Data Science to Production
  • 2.
     Data sciencechallenges  Iguazio data science PaaS over Kubernetes  NVIDIA solutions to accelerate data science with Kubernetes o GPU integration, TensorRT, RAPIDS  Hands on tutorial o End-to-end application: real-time predictive infrastructure monitoring (ingest, explore, hyper param training, deploy to production) o Serverless and scale-out data science o NVIDIA RAPIDS  Summary  Q&A Agenda
  • 3.
    Today: ML Lifecycleis Complex and Siloed Data Prep & Analytics Data Engineers Model Building Data Scientists Model Deployment ETL Data Lakes/ Warehouses CSVs Model Need more fresh data Tune model Active Data (CSV/in-mem) GPU Data Engineers and App Developers ML Model Serving App Deployment Interactive App Stream Processing Triggers and InteractionsDatabase
  • 4.
    4 ML Challenges inReal Life Re-coding & instrumenting AI Model “Depth” & Accuracy vs Performance & Costs Observability & Reproducibility Infrastructure and Software Complexity Can we gather (and prep) model features in production?
  • 5.
    5 Solution: Fast &Continuous Data Science Pipeline Collect Constantly Ingest, Clean & Tag Data via “Collectors” Develop “Serverless” Functions & Notebooks Deploy to Production Triggers and Interactions Intelligent Serverless Run-Time In Cloud, On-prem or Edge Build & Test CI/CD for Code & Models ML Model Training CPU GPU Monitor & Reiterate Deploy in Any Cloud or Edge Deliver Accurate Results in Real-time Develop and Iterate Faster
  • 6.
    6 Iguazio: Open &High-Performance Data-Science PaaS Real-time Structured & Unstructured Data Fabric External Data Managed & hardened open-source plus 3rd party services and apps Secure real-time data sharing enabling collaboration & parallelism Self-service experience from A to Z CPU GPU Built on a cloud-native architecture Compute
  • 7.
    7 Develop Faster, RunFaster, Use Less Resources Managed Jupyter Data science notebooks and online IDE  Serverless notebooks: self-service, scale to zero on idle  Simplify, secure and accelerate data access and processing  Accelerate applications and training using shared GPUs and ML services  One-click deployment to production (as jobs, real-time functions and dashboards) Time Series Stream Table Object GPU Historical and real-time data from a variety of sources Integrated, 3rd party or cloud ML services on-demand
  • 8.
    8 Deploy Faster toProduction with Serverless Nuclio: the leading open-source serverless for real-time intelligence  Minimize software development and maintenance overhead  Extreme performance (Up to 370K events/sec per process, 0.1 ms latency, fast data access)  Open, supports many event/data sources - HTTP, streaming, messaging, jobs  One-click deployment from many sources (code, containers, notebooks, git, templates) Cloud, On-prem or Edge One-Click Deployment
  • 9.
    9 Kubernetes Kubernetes Helps Simplifythe Use of Clusters and GPUs Think of Kubernetes as an operating system for a cluster. Kubernetes manages nodes, administer access, launch containers, jobs and more Container Worker Worker Worker Worker C. C. Container Master Server API Server Replication Controller Scheduler Daemon Daemon Daemon Daemon Infrastructure as code: e.g. PyTorch Training Job pytorch-job.yml --- apiVersion: batch/v1 kind: Job metadata: name: pytorch-example spec: backoffLimit: 5 template: spec: imagePullSecrets: - name: nvcr.dgxkey containers: - name: pytorch-container image: nvcr.io/nvidia/pytorch:18.06-py3 command: ["/bin/sh"] args: ["-c", "python /examples/mnist/main.py"] resources: limits: nvidia.com/gpu: 1 9
  • 10.
    10 Open Source, End-to-endGPU-accelerated Workflow Built On CUDA Data preparation / wrangling cuDF Optimized ML model training cuML Visualization Data visualization libraries data insights Re-Imagining Data Science Workflow 10
  • 11.
    11 Software Stack Python DataPreparation cuDF Visualization cuGRAPH Model Training cuML CUDA PYTHON APACHE ARROW on GPU Memory DASK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUMLCUDF CUGRAPH Read/Write RAPIDS dataframes Directly into Iguzaio Database & FS RAPIDS – GPU Accelerated Data Science 11
  • 12.
    12 2,290 1,956 1,999 1,948 169 157 0 1,000 2,0003,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 0 5,000 10,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 cuML — XGBoost 2,741 1,675 715 379 42 19 0 1,000 2,000 3,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 End-to-End cuIO/cuDF — Load and Data Preparation Benchmark 200GB CSV dataset; Data preparation includes joins, variable transformations. CPU Cluster Configuration CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network Time in seconds — Shorter is better cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost Faster Speeds, Real World Benefits 12
  • 13.
    13 TensorRT – GPUPowered Inference Server Available with Monthly Updates Models supported ● TensorFlow GraphDef/SavedModel ● TensorFlow and TensorRT GraphDef ● TensorRT Plans ● Caffe2 NetDef (ONNX import) Multi-GPU support Concurrent model execution Server HTTP REST API/gRPC Python/C++ client libraries Python/C++ Client Library 13
  • 14.
    Details: https://developer.nvidia.com/tensorrt Time SeriesDB NVIDIA TensorRT Over Kubernetes & Iguazio Nuclio Function (Serverless) 14
  • 15.
  • 16.
    16  Eliminate complexitythrough pre-integrated managed services  Leverage parallelism and hardware acceleration to improve ROI  Consolidate data engineering, science and app dev platforms  Focus on the end goal: Build and Deploy Intelligent Apps Faster: Summary Production Deployment of Intelligent Applications
  • 17.
  • 18.
  • 19.
    19  Many APIsand models on the same data o SQL, NoSQL, time series, stream, files o Custom APIs, streaming, sync and ETLs  Minimize CPU, mem, and ops overhead Iguazio Smart Unified Real-time DB & File-System 100TB NVMe Flash (direct attached) High-Speed Fabric Real-time Firewall Smart Real-time DB Many standard & open APIs on a unified DB Engine Use NVMe Flash as an extension of memory Granular security S3 ETL Streams  In-memory performance, at 1/30 of the cost and 30x the density (on Flash)  Real-time time series & data analytics  Fine-grained security Apps & Users Backup
  • 20.
    Real-time Intelligent InfrastructureManagement Auto-Healing Network Operations  Replaced a complex Hadoop based data pipeline that was never productized  Cross correlating real-time data from multiple sources with historical data  AI-based predictions trigger pre- programmed actions that fix evolving problems in the network  Implemented within weeks of initial deployment Singtel uses Iguazio to predict network outages and avoid them in real-time Singtel’s self-healing network is the perfect example of a client shifting from reactive to proactive with Iguazio 20
  • 21.
    21 Real-time Intelligent InfrastructureManagement Maintaining Continuous Fast Response for 2nd Tier Cloud Services Analyzing and predicting cloud service response time for optimal results Real-time Data Ingestion From multiple monitoring tools including Jennifer and Zabbix Anomaly Detection Accurate anomaly detection with order of magnitude lower false positives as opposed to the previous Elasticsearch based platform Root Cause Analysis Real-time root cause analysis from multiple factors. For example, correlating servers’ CPU’s and applications response time changes occurring simultaneously Predictive Analytics Predicting response times and sending real-time alerts indicating which factors need to be adjusted to avoid malfunctions From deployment to completion in less than two weeks!
  • 22.
    22 Evolve Into anAgile Cloud-Native Architecture YARN HbaseHDFS Map Reduce Pig, Hive, .. DBaaS S3 (object) From a Legacy & Resource Intensive Architecture To Simpler & Modern Approach Data Orchestration Middleware Your Business Logic Consume Innovate Serverless Data-Science BigData