Mike Wendt
@mike_wendt
GPU ACCELERATION
WITH GOAI IN PYTHON
2
THE DATA
STRUGGLE IS
REAL…
3
DATA FORMATS
Avro
XML
JSON
GML
ProtoBuf
HDFS
Pickle
CSV
Parquet
Panda
Plain Text vs Binary
Compressed vs Uncompressed
CSR
COO
CSC
* Not a complete list
Numpy
4
DATA PROCESSING EVOLUTION
Faster Data Access Less Data Movement
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
Read
Query ETL ML Train
Hadoop Processing, Reading from disk
5
DATA PROCESSING EVOLUTION
Faster Data Access Less Data Movement
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
Read
Query ETL ML Train
HDFS
Read
Query ETL ML Train
Hadoop Processing, Reading from disk
25-100x Improvement
Less code
Language flexible
Primarily In-Memory
Spark In-Memory Processing
6
GPUS FTW!
7
25-100x Improvement
Less code
Language flexible
Primarily In-Memory
DATA PROCESSING EVOLUTION
Faster Data Access Less Data Movement
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
Read
Query ETL ML Train
HDFS
Read
Query ETL ML Train
HDFS
Read
GPU
Read
Query
CPU
Write
GPU
Read
ETL
CPU
Write
GPU
Read
ML
Train
5-10x Improvement
More code
Language rigid
Substantially on GPU
GPU/Spark In-Memory Processing
Hadoop Processing, Reading from disk
Spark In-Memory Processing
8
WE CAN DO
BETTER!
9
GRAPH
PROCESSING
ANALYTICS
GPU DATABASES
GPU-ACCELERATED TECHNOLOGIES
10
APP A
GPU-ACCELERATED ARCHITECTURE THEN
Too much data movement and too many different data formats
CPU GPU
APP B
Read DataH2O.ai
Anaconda Gunrock
Graphistry
BlazingDB MapDKinetica
Copy & Convert
Copy & Convert
Copy & Convert
Load Data
APP A GPU
Data
APP B
GPU
Data
11
APP A
GPU-ACCELERATED ARCHITECTURE THEN
Too much data movement and too many different data formats
CPU GPU
APP B
Read DataH2O.ai
Anaconda Gunrock
Graphistry
BlazingDB MapDKinetica
Copy & Convert
Copy & Convert
Copy & Convert
Load Data
APP A GPU
Data
APP B
GPU
Data
12
GPU-ACCELERATED ARCHITECTURE NOW
Single data format and shared access to data on GPU
CPU GPU
GPU
MEM
Read DataH2O.ai
Anaconda Gunrock
Graphistry
BlazingDB MapDKinetica Load Data
Apache Arrow
GPU
Data
Frame
Powered by:
13
GPU OPEN ANALYTICS INITIATIVE
github.com/gpuopenanalytics
GPU Data Frame (GDF)
Ingest/
Parse
Exploratory
Analysis
Feature
Engineering
ML/DL
Algorithms
Grid Search
Scoring
Model
Export
@gpuoai
Apache Arrow
14
EASY TO USE
@gpuoai
15
EASY TO USE
@gpuoai
16
USE GPUS IN PYTHON
@gpuoai
17
GROWING COMMUNITY SUPPORT
Apache Arrow Apache Parquet
18
APACHE ARROW COMMON DATA LAYER
From Apache Arrow Home Page - https://arrow.apache.org/
19
GPU ACCELERATION ACROSS THE ECOSYSTEM
20
25-100x Improvement
Less code
Language flexible
Primarily In-Memory
DATA PROCESSING EVOLUTION
Faster Data Access Less Data Movement
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
Read
Query ETL ML Train
HDFS
Read
Query ETL ML Train
HDFS
Read
GPU
Read
Query
CPU
Write
GPU
Read
ETL
CPU
Write
GPU
Read
ML
Train
Arrow
Read
Query ETL
ML
Train
5-10x Improvement
More code
Language rigid
Substantially on GPU
25-100x Improvement
Same code
Language flexible
Primarily on GPU
End to End GPU Processing (GOAI)
GPU/Spark In-Memory Processing
Hadoop Processing, Reading from disk
Spark In-Memory Processing
21
LET’S COLLABORATE AND
SHARE DATA ON GPUS!
@gpuoai
github.com/gpuopenanalytics
Docker Demos with
Jupyter Notebooks
pyGDF Library
gpuopenanalytics.com
Google Groups
Public Slack
Wiki
Mike Wendt @mike_wendt
THANK YOU

UPDATED 17-11-27 PyData NY Lightning Talk: GPU Acceleration with GOAI in Python