SlideShare a Scribd company logo
1 of 42
Download to read offline
Keith Kraus 18-10-2018
RAPIDS: GPU-ACCELERATED ETL AND
FEATURE ENGINEERING
2
REALITIES OF DATA
3
Faster Data Access Less Data Movement
DATA PROCESSING EVOLUTION
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
Read
Query ETL ML Train
Hadoop Processing, Reading from disk
4
Faster Data Access Less Data Movement
DATA PROCESSING EVOLUTION
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
Read
Query ETL ML Train
HDFS
Read
Query ETL ML Train
Hadoop Processing, Reading from disk
25-100x
Improvement
Less code
Language flexible
Primarily In-Memory
Spark In-Memory Processing
5
1980 1990 2000 2010 2020
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
Transistors
(thousands)
NEED MORE SPEED
CPU Performance Has Plateaued
6
WE NEED MORE COMPUTE!
Basic workloads are bottlenecked by the CPU
Source: Mark Litwintschik’s blog: 1.1 Billion Taxi Rides: EC2 versus EMR
• In a simple benchmark consisting of
aggregating data, the CPU is the
bottleneck
• This is after the data is parsed and
cached into memory which is
another common bottleneck
• The CPU bottleneck is even worse
in more complex workloads!
SELECT cab_type, count(*) FROM
trips_orc GROUP BY cab_type;
7
HOW CAN WE DO BETTER?
• Focus on the full Data Science workflow
• Data Loading
• Data Transformation
• Data Analytics
• Python
• Provide as close to a drop-in replacement for existing tools
• Performance - Leverage GPUs
8
1980 1990 2000 2010 2020
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
GPU-Computing perf
1.5X per year 1000X
By 2025
NEW BEGINNINGS
GPU Performance Grows
9
GPU
ADOPTION
BARRIERS
• Too much data movement
• Too many makeshift data
formats
• Writing CUDA C/C++ is hard
• No Python API for data
manipulation
Yes GPUs are fast but …
10
APP A
DATA MOVEMENT AND TRANSFORMATION
The bane of productivity and performance
CPU GPU
APP B
Read Data
Copy & Convert
Copy & Convert
Copy & Convert
Load Data
APP A GPU
Data
APP B
GPU
Data
APP A
APP B
11
APP A
DATA MOVEMENT AND TRANSFORMATION
What if we could keep data on the GPU?
APP B
Copy & Convert
Copy & Convert
Copy & Convert
APP A GPU
Data
APP B
GPU
Data
Read Data
Load Data
APP B
CPU GPU
APP A
12
DATA FORMATS
Avro
XML
JSON
GML
ProtoBuf
HDFS
Pickle
CSV
Parquet
Pandas
Plain Text vs Binary Compressed vs Uncompressed
CSR
COO
CSC
* Not a complete list
Numpy
13
LEARNING FROM APACHE ARROW
From Apache Arrow Home Page - https://arrow.apache.org/
14
cuDF
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch & Chainer
Deep Learning
Kepler.GL
Visualization
RAPIDS OPEN SOURCE SOFTWARE
15
Faster Data Access Less Data Movement
DATA PROCESSING EVOLUTION
25-100x Improvement
Less code
Language flexible
Primarily In-Memory
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
Read
Query ETL ML Train
HDFS
Read
Query ETL ML Train
HDFS
Read
GPU
Read
Query
CPU
Write
GPU
Read
ETL
CPU
Write
GPU
Read
ML
Train
Arrow
Read
Query ETL
ML
Train
5-10x Improvement
More code
Language rigid
Substantially on GPU
50-100x Improvement
Same code
Language flexible
Primarily on GPU
RAPIDS
GPU/Spark In-Memory Processing
Hadoop Processing, Reading from disk
Spark In-Memory Processing
16
RAPIDS
Rapid Accelerated Platform for Integrating Data Science
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
• Learn what the data science community needs
• Use best practices and standards
• Build scalable systems and algorithms
• Test Applications and workflows
• Iterate
17
RAPIDS
How can I download and use RAPIDS?
• https://ngc.nvidia.com/registry/nvidia-
rapidsai-rapidsai
• https://hub.docker.com/r/rapidsai/rapidsai/
• https://github.com/rapidsai
• WIP: https://anaconda.org/rapidsai/
• WIP:
• https://pypi.org/project/cudf
• https://pypi.org/project/cuml
18
AI LIBRARIES
Accelerating more of the AI ecosystem
Graph Analytics is fundamental to network analysis
Machine Learning is fundamental to prediction,
classification, clustering, anomaly detection and
recommendations.
Both can be accelerated with NVIDIA GPU
8x V100 20-90x faster than dual socket CPU
Decisions Trees
Random Forests
Linear Regressions
Logistics Regressions
K-Means
K-Nearest Neighbor
DBSCAN
Kalman Filtering
Principal Components
Single Value Decomposition
Bayesian Inferencing
PageRank
BFS
Jaccard Similarity
Single Source Shortest Path
Triangle Counting
Louvain Modularity
ARIMA
Holt-Winters
Machine Learning Graph Analytics
Time Series
XGBoost, Criteo Dataset, 90x
3 Hours to 2 mins on 1 DGX-1
cuML & cuGraph
19
CUDF + XGBOOST
DGX-2 vs Scale Out CPU Cluster
• Full end to end pipeline
• Leveraging Dask + PyGDF
• Store each GPU results in sys mem then read back in
• Arrow to Dmatrix (CSR) for XGBoost
20
CUDF + XGBOOST
Scale Out GPU Cluster vs DGX-2
0 50 100 150 200 250 300 350
5xDGX-1
DGX-2
Chart Title
ETL+CSV (s) ML Prep (s) ML (s)
• Full end to end pipeline
• Leveraging Dask for multi-node + PyGDF
• Store each GPU results in sys mem then read back in
• Arrow to Dmatrix (CSR) for XGBoost
21
CUGRAPH
GPU-Accelerated Graph Analytics Library
22
cuDF
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch & Chainer
Deep Learning
Kepler.GL
Visualization
CUDF
23
GPU-ACCELERATED ETL
Is GPU-acceleration really needed?
24
GPU-ACCELERATED ETL
The average data scientist spends 90+% of their time in ETL as opposed
to training models
25
CUDF
GPU DataFrame library
• Apache Arrow data format
• Pandas-like API
• Unary and Binary Operations
• Joins / Merges
• GroupBys
• Filters
• User-Defined Functions (UDFs)
• Accelerated file readers
• Etc.
26
CUDF
Today
LibGDF PyGDF
• Low level library containing function
implementations and C/C++ API
• Importing/exporting a GDF using the CUDA IPC
mechanism
• CUDA kernels to perform element-wise math
operations on GPU DataFrame columns
• CUDA sort, join, groupby, and reduction
operations on GPU DataFrames
• A Python library for manipulating GPU
DataFrames
• Python interface to LibGDF library with
additional functionality
• Creating GDFs from Numpy arrays, Pandas
DataFrames, and PyArrow Tables
• JIT compilation of User-Defined Functions
(UDFs) using Numba
27
CUDF
Refactor in Progress
cuDF
• Single repository containing both the low level
implementation and high level wrappers and APIs
• Future high level language bindings based on
community demand, feedback, and contributions
• Moving from CFFI to Cython for Python bindings
to better integrate into the PyData community
PyGDFLibGDF
cuDF
28
PANDAS-LIKE API
Python GPU DataFrame library
29
PANDAS-LIKE API
Pandas ↔ PyGDF
30
PANDAS-LIKE API
Built-In Functions
31
DEMO
32
DASK
What is Dask and why does RAPIDS use it for scaling out?
• Dask is a distributed computation scheduler
built to scale Python workloads from laptops to
supercomputer clusters.
• Extremely modular with scheduling, compute,
data transfer, and out-of-core handling all being
disjointed allowing us to plug in our own
implementations.
• Can easily run multiple Dask workers per node
to allow for an easier development model of
one worker per GPU regardless of single node
or multi node environment.
33
DASK
Scale up and out with cuDF
• Use cuDF primitives underneath in map-reduce style
operations with the same high level API
• Instead of using typical Dask data movement of
pickling objects and sending via TCP sockets, take
advantage of hardware advancements using a
communications framework called OpenUCX:
• For intranode data movement, utilize NVLink
and PCIe peer-to-peer communications
• For internode data movement, utilize GPU
RDMA over Infiniband and RoCE
https://github.com/rapidsai/dask_gdf
34
DASK
Scale up and out with cuML
• Native integration with Dask + cuDF
• Can easily use Dask workers to initialize NCCL for
optimized gather / scatter operations
• Example: this is how the dask-xgboost included
in the container works for multi-GPU and multi-
node, multi-GPU
• Provides easy to use, high level primitives for
synchronization of workers which is needed for many
ML algorithms
35
LOOKING TO THE
FUTURE
36
Next few months
GPU DATAFRAME
• Continue improving performance and functionality
• Single GPU
• Single node, multi GPU
• Multi node, multi GPU
• String Support
• Support for specific “string” dtype with GPU-accelerated functionality similar to Pandas
• Accelerated Data Loading
• File formats: CSV, Parquet, Orc – to start
37
GPU-Accelerated string functions with a Pandas-like API
CUSTRING
• API and functionality is following Pandas:
https://pandas.pydata.org/pandas-
docs/stable/api.html#string-handling
• lower()
• ~22x speedup
• find()
• ~40x speedup
• slice()
• ~100x speedup
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
lower() find(#) slice(1,15)
milliseconds
Pandas cudastrings
38
CPUs bottleneck data loading in high throughput systems
ACCELERATED DATA LOADING
• CSV Reader
• Follows API of pandas.read_csv
• Current implementation is >10x speed
improvement over pandas
• Parquet Reader
• Work in progress:
https://github.com/gpuopenanalytics/li
bgdf/pull/85
• Will follow API of pandas.read_parquet
• ORC Reader
• Additionally looking towards GPU-accelerating
decompression for common compression
schemes
Source: Apache Crail blog: SQL Performance: Part 1 - Input File Formats
39
PYTHON CUDA ARRAY INTERFACE
Interoperability for Python GPU Array Libraries
• The CUDA array interface is a standard format
that describes a GPU array to allow sharing
GPU arrays between different libraries without
needing to copy or convert data
• Numba, CuPy, and PyTorch are the first
libraries to adopt the interface:
• https://numba.pydata.org/numba-
doc/dev/cuda/cuda_array_interface.html
• https://github.com/cupy/cupy/releases/tag/
v5.0.0b4
• https://github.com/pytorch/pytorch/pull/119
84
40
JOIN THE REVOLUTION
Everyone Can Help!
Integrations, feedback, documentation support, pull requests, new issues, or donations welcomed!
APACHE ARROW GPU Open Analytics
Initiative
https://arrow.apache.org/
@ApacheArrow
http://gpuopenanalytics.com/
@GPUOAI
RAPIDS
https://rapids.ai
@RAPIDSAI
41
WE’RE HIRING
Help us build the future!
• Junior/Mid/Senior Data Scientists
• Junior/Mid/Senior Data Engineers
• CUDA
• Internships
THANK YOU
Keith Kraus @keithjkraus

More Related Content

What's hot

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookDatabricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache SparkDataWorks Summit
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at UberDataWorks Summit
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovMaksud Ibrahimov
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!Databricks
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...OpenStack Korea Community
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 

What's hot (20)

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - Preview
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache Spark
 
Llama-index
Llama-indexLlama-index
Llama-index
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at Uber
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 

Similar to RAPIDS: GPU-Accelerated ETL and Feature Engineering

GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...Manish Harsh
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 

Similar to RAPIDS: GPU-Accelerated ETL and Feature Engineering (20)

GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 

Recently uploaded

The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

RAPIDS: GPU-Accelerated ETL and Feature Engineering

  • 1. Keith Kraus 18-10-2018 RAPIDS: GPU-ACCELERATED ETL AND FEATURE ENGINEERING
  • 3. 3 Faster Data Access Less Data Movement DATA PROCESSING EVOLUTION HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train Hadoop Processing, Reading from disk
  • 4. 4 Faster Data Access Less Data Movement DATA PROCESSING EVOLUTION HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train Hadoop Processing, Reading from disk 25-100x Improvement Less code Language flexible Primarily In-Memory Spark In-Memory Processing
  • 5. 5 1980 1990 2000 2010 2020 102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year Transistors (thousands) NEED MORE SPEED CPU Performance Has Plateaued
  • 6. 6 WE NEED MORE COMPUTE! Basic workloads are bottlenecked by the CPU Source: Mark Litwintschik’s blog: 1.1 Billion Taxi Rides: EC2 versus EMR • In a simple benchmark consisting of aggregating data, the CPU is the bottleneck • This is after the data is parsed and cached into memory which is another common bottleneck • The CPU bottleneck is even worse in more complex workloads! SELECT cab_type, count(*) FROM trips_orc GROUP BY cab_type;
  • 7. 7 HOW CAN WE DO BETTER? • Focus on the full Data Science workflow • Data Loading • Data Transformation • Data Analytics • Python • Provide as close to a drop-in replacement for existing tools • Performance - Leverage GPUs
  • 8. 8 1980 1990 2000 2010 2020 102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year GPU-Computing perf 1.5X per year 1000X By 2025 NEW BEGINNINGS GPU Performance Grows
  • 9. 9 GPU ADOPTION BARRIERS • Too much data movement • Too many makeshift data formats • Writing CUDA C/C++ is hard • No Python API for data manipulation Yes GPUs are fast but …
  • 10. 10 APP A DATA MOVEMENT AND TRANSFORMATION The bane of productivity and performance CPU GPU APP B Read Data Copy & Convert Copy & Convert Copy & Convert Load Data APP A GPU Data APP B GPU Data APP A APP B
  • 11. 11 APP A DATA MOVEMENT AND TRANSFORMATION What if we could keep data on the GPU? APP B Copy & Convert Copy & Convert Copy & Convert APP A GPU Data APP B GPU Data Read Data Load Data APP B CPU GPU APP A
  • 12. 12 DATA FORMATS Avro XML JSON GML ProtoBuf HDFS Pickle CSV Parquet Pandas Plain Text vs Binary Compressed vs Uncompressed CSR COO CSC * Not a complete list Numpy
  • 13. 13 LEARNING FROM APACHE ARROW From Apache Arrow Home Page - https://arrow.apache.org/
  • 14. 14 cuDF Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch & Chainer Deep Learning Kepler.GL Visualization RAPIDS OPEN SOURCE SOFTWARE
  • 15. 15 Faster Data Access Less Data Movement DATA PROCESSING EVOLUTION 25-100x Improvement Less code Language flexible Primarily In-Memory HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train Arrow Read Query ETL ML Train 5-10x Improvement More code Language rigid Substantially on GPU 50-100x Improvement Same code Language flexible Primarily on GPU RAPIDS GPU/Spark In-Memory Processing Hadoop Processing, Reading from disk Spark In-Memory Processing
  • 16. 16 RAPIDS Rapid Accelerated Platform for Integrating Data Science APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE • Learn what the data science community needs • Use best practices and standards • Build scalable systems and algorithms • Test Applications and workflows • Iterate
  • 17. 17 RAPIDS How can I download and use RAPIDS? • https://ngc.nvidia.com/registry/nvidia- rapidsai-rapidsai • https://hub.docker.com/r/rapidsai/rapidsai/ • https://github.com/rapidsai • WIP: https://anaconda.org/rapidsai/ • WIP: • https://pypi.org/project/cudf • https://pypi.org/project/cuml
  • 18. 18 AI LIBRARIES Accelerating more of the AI ecosystem Graph Analytics is fundamental to network analysis Machine Learning is fundamental to prediction, classification, clustering, anomaly detection and recommendations. Both can be accelerated with NVIDIA GPU 8x V100 20-90x faster than dual socket CPU Decisions Trees Random Forests Linear Regressions Logistics Regressions K-Means K-Nearest Neighbor DBSCAN Kalman Filtering Principal Components Single Value Decomposition Bayesian Inferencing PageRank BFS Jaccard Similarity Single Source Shortest Path Triangle Counting Louvain Modularity ARIMA Holt-Winters Machine Learning Graph Analytics Time Series XGBoost, Criteo Dataset, 90x 3 Hours to 2 mins on 1 DGX-1 cuML & cuGraph
  • 19. 19 CUDF + XGBOOST DGX-2 vs Scale Out CPU Cluster • Full end to end pipeline • Leveraging Dask + PyGDF • Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost
  • 20. 20 CUDF + XGBOOST Scale Out GPU Cluster vs DGX-2 0 50 100 150 200 250 300 350 5xDGX-1 DGX-2 Chart Title ETL+CSV (s) ML Prep (s) ML (s) • Full end to end pipeline • Leveraging Dask for multi-node + PyGDF • Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost
  • 22. 22 cuDF Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch & Chainer Deep Learning Kepler.GL Visualization CUDF
  • 24. 24 GPU-ACCELERATED ETL The average data scientist spends 90+% of their time in ETL as opposed to training models
  • 25. 25 CUDF GPU DataFrame library • Apache Arrow data format • Pandas-like API • Unary and Binary Operations • Joins / Merges • GroupBys • Filters • User-Defined Functions (UDFs) • Accelerated file readers • Etc.
  • 26. 26 CUDF Today LibGDF PyGDF • Low level library containing function implementations and C/C++ API • Importing/exporting a GDF using the CUDA IPC mechanism • CUDA kernels to perform element-wise math operations on GPU DataFrame columns • CUDA sort, join, groupby, and reduction operations on GPU DataFrames • A Python library for manipulating GPU DataFrames • Python interface to LibGDF library with additional functionality • Creating GDFs from Numpy arrays, Pandas DataFrames, and PyArrow Tables • JIT compilation of User-Defined Functions (UDFs) using Numba
  • 27. 27 CUDF Refactor in Progress cuDF • Single repository containing both the low level implementation and high level wrappers and APIs • Future high level language bindings based on community demand, feedback, and contributions • Moving from CFFI to Cython for Python bindings to better integrate into the PyData community PyGDFLibGDF cuDF
  • 28. 28 PANDAS-LIKE API Python GPU DataFrame library
  • 32. 32 DASK What is Dask and why does RAPIDS use it for scaling out? • Dask is a distributed computation scheduler built to scale Python workloads from laptops to supercomputer clusters. • Extremely modular with scheduling, compute, data transfer, and out-of-core handling all being disjointed allowing us to plug in our own implementations. • Can easily run multiple Dask workers per node to allow for an easier development model of one worker per GPU regardless of single node or multi node environment.
  • 33. 33 DASK Scale up and out with cuDF • Use cuDF primitives underneath in map-reduce style operations with the same high level API • Instead of using typical Dask data movement of pickling objects and sending via TCP sockets, take advantage of hardware advancements using a communications framework called OpenUCX: • For intranode data movement, utilize NVLink and PCIe peer-to-peer communications • For internode data movement, utilize GPU RDMA over Infiniband and RoCE https://github.com/rapidsai/dask_gdf
  • 34. 34 DASK Scale up and out with cuML • Native integration with Dask + cuDF • Can easily use Dask workers to initialize NCCL for optimized gather / scatter operations • Example: this is how the dask-xgboost included in the container works for multi-GPU and multi- node, multi-GPU • Provides easy to use, high level primitives for synchronization of workers which is needed for many ML algorithms
  • 36. 36 Next few months GPU DATAFRAME • Continue improving performance and functionality • Single GPU • Single node, multi GPU • Multi node, multi GPU • String Support • Support for specific “string” dtype with GPU-accelerated functionality similar to Pandas • Accelerated Data Loading • File formats: CSV, Parquet, Orc – to start
  • 37. 37 GPU-Accelerated string functions with a Pandas-like API CUSTRING • API and functionality is following Pandas: https://pandas.pydata.org/pandas- docs/stable/api.html#string-handling • lower() • ~22x speedup • find() • ~40x speedup • slice() • ~100x speedup 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 lower() find(#) slice(1,15) milliseconds Pandas cudastrings
  • 38. 38 CPUs bottleneck data loading in high throughput systems ACCELERATED DATA LOADING • CSV Reader • Follows API of pandas.read_csv • Current implementation is >10x speed improvement over pandas • Parquet Reader • Work in progress: https://github.com/gpuopenanalytics/li bgdf/pull/85 • Will follow API of pandas.read_parquet • ORC Reader • Additionally looking towards GPU-accelerating decompression for common compression schemes Source: Apache Crail blog: SQL Performance: Part 1 - Input File Formats
  • 39. 39 PYTHON CUDA ARRAY INTERFACE Interoperability for Python GPU Array Libraries • The CUDA array interface is a standard format that describes a GPU array to allow sharing GPU arrays between different libraries without needing to copy or convert data • Numba, CuPy, and PyTorch are the first libraries to adopt the interface: • https://numba.pydata.org/numba- doc/dev/cuda/cuda_array_interface.html • https://github.com/cupy/cupy/releases/tag/ v5.0.0b4 • https://github.com/pytorch/pytorch/pull/119 84
  • 40. 40 JOIN THE REVOLUTION Everyone Can Help! Integrations, feedback, documentation support, pull requests, new issues, or donations welcomed! APACHE ARROW GPU Open Analytics Initiative https://arrow.apache.org/ @ApacheArrow http://gpuopenanalytics.com/ @GPUOAI RAPIDS https://rapids.ai @RAPIDSAI
  • 41. 41 WE’RE HIRING Help us build the future! • Junior/Mid/Senior Data Scientists • Junior/Mid/Senior Data Engineers • CUDA • Internships
  • 42. THANK YOU Keith Kraus @keithjkraus