SlideShare a Scribd company logo
Accelerate Data Science in
Python with RAPIDS
John Zedlewski, Senior Director, RAPIDS and Data Science @ NVIDIA
Ashwin Srinath, Senior Engineer, RAPIDS @NVIDIA
GTC 2023
RAPIDS brings GPU acceleration to the open-source data
science and data engineering ecosystem
General Purpose and Domain-Specific Libraries
Data Preparation/ETL Visualization
Analytics/ML/Graph
cuDF
● GPU-accelerated ETL functions
● Tracks Pandas and other common
PyData APIs
● Dask + UCX integration for scaling
RAPIDS for Apache Spark
● RAPIDS accelerator for Apache Spark
RAPIDS ML
● GPU-native cuML library (scikit-
learn-style APIs) XGBoost, RAFT,
FIL, HPO, DL interop, and more
cuGraph
● GPU graph analytics, including
Louvain, PageRank, and more
● Multi-Node Multi-GPU features
cuxfilter
● GPU-accelerated cross-filtering
Viz integration
● pyViz: Plotly Dash, Bokeh,
Datashader, HoloViews, hvPlot
● Node-RAPIDS bindings for node.js
Morpheus
Cybersecurity application
development framework
cuSignal
Signals processing
cuSpatial
Spatial analytics
Merlin
Recommender Systems
development framework
cuCIM
Computer vision & image processing primitives
NVTabular
Tabular data feature engineering
Application and Domain-Specific Frameworks
cuDF
A GPU DataFrame library in Python with a pandas-like API built into the PyData ecosystem
Pandas-like API on the GPU Best-in-Class Performance (Benchmark)
>>> import pandas as pd
>>> df = pd.read_csv("filepath")
>>> df.groupby(“col”).mean()
>>> df.rolling(window=3).sum()
>>> import cudf
>>> df = cudf.read_csv("filepath")
>>> df.groupby(“col”).mean()
>>> df.rolling(window=3).sum()
GPU
CPU
pandas
cuDF
Average Speed-Ups: 10-100x
10 Minutes to cuDF
Groupby Time Series
Strings and Regex
Missing Data
Indexing
Nested Types
Rolling Windows
CuPy Interoperability
UDFs
NVIDIA A100 vs. AMD EPYC 7642 48-Core Processor
cuDF Python vs. Pandas 1.4
Performance
maximized on large in-
memory datasets
Let’s Code:
Loading and Preparing Data
Get the notebook at
rapids.ai/introgtc2023
cuDF: Advanced Features
I/O and Interoperability
High-performance IO
▸ CUDA-accelerated readers and writers for CSV,
JSON, Parquet, ORC, Avro, and plain text
▸ GPU-direct storage to bypass PCI bottlenecks
▸ On-GPU compression / decompression with nvComp
Support for text data on GPU
▸ Standard Pandas-style string functions and regular
expressions, accelerated in CUDA
▸ Advanced parsers and tokenizers for deep learning
and NLP, such as Byte-Pair Encoding
Complex Datatypes
▸ Struct, List, and Decimal128 columns – often found
in enterprise datasets but not in core Pandas
Interoperability
▸ Zero-copy data passing to cuPy, Pytorch, and more
via dlpack and __cuda_array_interface__
nelem = 10000
df = cudf.DataFrame({
'a':range(nelem),
'b':range(500, nelem + 500),
'c':range(1000, nelem + 1000)}
)
# Convert to cupy
arr_cupy = df.to_cupy()
# Convert from PyTorch
import torch
From torch.utils import dlpack
data = torch.randn(40000).cuda()
Df = cudf.from_dlpack(
dlpack.to_dlpack(data))
cuDF: User-defined functions
▸ RAPIDS leverages Numba to compile a wide range of
your user-defined functions to CUDA code
▸ Numba has explicit CUDA JIT support, but it is
automatically applied and optimized in key RAPIDS
locations – totally transparently
▸ Can be used in:
* apply on Series and DataFrames
* Rolling windows for time series (.rolling)
* apply_grouped for aggregations
* On strings in many cases (newly added!)
Bringing all your Python code to CUDA
>> sr = cudf.Series([-1, 1, 2, 3])
# Explicit Numba
@cuda.jit
def rectified_linear(x):
if x < 0:
return 0
elif x < 1:
return x
else:
return 1
>> rectified_linear(sr)
# Automatic Numba with a lambda
>> sr.apply(lambda x: math.log(x+1))
Let’s Code:
Working with UDFs and
Feature Engineering
Get the notebook at
rapids.ai/introgtc2023
cuML
Accelerated Machine Learning with a scikit-learn API
>>> from sklearn.ensemble import
RandomForestClassifier
>>> clf = RandomForestClassifier()
>>> clf.fit(x, y)
>>> from cuml.ensemble import
RandomForestClassifier
>>> clf = RandomForestClassifier()
>>> clf.fit(x, y)
GPU
CPU
Scikit-learn
cuML
40+ GPU-Accelerated Algorithms & Growing
Time Series Preprocessing
Classification
Tree Models
Cross Validation
Clustering
Explainability
Dimensionality Reduction
Regression
A100 GPU vs. 2x Intel Xeon E5-2698 CPUs (80 logical cores)
cuML 23.02, scikit-learn 1.2, umap-learn 0.5.3
Performance
maximized on large
in-memory datasets
● One line of code change to unlock up to 20x
speedups with GPUs
● Scalable to the world’s largest datasets with
Dask and PySpark
● Built-in SHAP support for model explainability
● Deployable with Triton for lighting-fast inference
in production
● Triton supports LightGBM and Random Forests as
well as XGBoost for inference
Accelerated XGBoost and Inference for Trees
“XGBoost is All You Need” – Bojan Tunguz, 4x Kaggle Grandmaster
>>> from xgboost import XGBClassifier
>>> clf = XGBClassifier()
>>> clf.fit(x, y)
>>> from xgboost import XGBClassifier
>>> clf =
XGBClassifier(tree_method=”gpu_hist”)
>>> clf.fit(x, y)
GPU
CPU
XGBoost
XGBoost
Up to 20x Speedups
Rapids Visualization
Scalable graphics with user-friendly interfaces
Leverage cuDF speedups to visualize, filter, and analzye
data frames fast with popular library integrations and
the RAPIDS-native cuXfilter and node-RAPIDS
Let’s code:
Intro to ML
(plus a little visualization)
Get the notebook at
rapids.ai/introgtc2023
• cuGraph is a library of graph
algorithms capable of processing the
world’s largest graphs
• Link analysis, community detection,
centrality, linear assignment, property
graphs, and more
• Friendly, consistent C, C++17, and
Python APIs compatible with NetworkX
• World-class performance for every
scale and use case
• Support for trillion+ edge graphs
• Graph neural network library
integration
• Graph database integration
cuGraph
Making Large-Scale Graph Analytics Possible
PageRank on 4.4 trillion edges at 1.5 seconds per iteration
Each node has eight A100 80GB GPUs, InfiniBand for inter-node
communication, and NVLink for intra-node communication
Scaling with RAPIDS + Dask
▸ Distributed extensions with familiar APIs for
DataFrames and Arrays
▸ Scale from laptop to supercomputer scales
– tested up to 1024 GPUs
▸ Deploy on any cloud service or Kubernetes in minutes
▸ Integrate cuDF, cuPy, cuML, cuGraph, and XGBoost
with a common framework
▸ Easily switch between CPU and GPU backends
Easy Multi-GPU for Python Programmers
# Start by telling Dask to use the GPU backend
with dask.config.set({“dataframe.backend”: “cudf”}):
ddf_s = dd.read_parquet(‘stores.parquet’)
ddf_p = dd.read_parquet(“purchases.parquet”)
ddf_p[“total”] = ddf_p.price * ddf_p.quantity
# Combine the two dataframes
ddf_join = ddf_p.merge(res,
on=["id"], how="inner")
ddf_join = ddf_join.set_index("key")
RAPIDS Accelerator for Apache Spark
Seamless integration with Apache Spark 3.x
spark.sql("""
select
order
count(*) as order_count
from
orders"""
)
spark.conf.set("spark.plugins",
"com.nvidia.spark.SQLPlugin")
spark.sql("""
select
order
count(*) as order_count
from
orders"""
)
CPU Spark
GPU Spark
Average Speed-Ups: 10x
~5x faster than CPU-based servers 78% cheaper than CPU-based servers
*CPU-only 4-node cluster: 4xn1-standard-32 (32vCPU, 120GB RAM)
*GPU 4-node cluster: 4xn1-standard-32 (32vCPU, 120GB RAM) and 8xT4 NVIDIA GPU
*NDS stands for NVIDIA Decision Support benchmark that is derived from the TPC-DS benchmark
and is used for internal testing. Results from NDS are not comparable to TPC-DS
350+
RAPIDS contributors on GitHub
Powering Modern Data Teams
Battle tested on the most challenging workloads, integrated with the most
innovative tools, and backed by a huge community
100+
Open-source and commercial
software integrations
25%
of Fortune 100 companies using
RAPIDS
Deploying RAPIDS
Running RAPIDS anywhere (https://docs.rapids.ai/deployment)
RAPIDS Deployment Documentation
More on RAPIDS at GTC 2023
General RAPIDS and Data Science
Accelerate Spark With RAPIDS For Cost Savings [S52202]
Accelerating Your Prototypes with NVIDIA RAPIDS and Friends* [DLIT51679] (DLI)
Learn How to Create Features from Tabular Data and Accelerate your Data Science Pipeline* [DLIT51195]
Accelerating Exploratory Data Analysis at LinkedIn [S51399]
cuSpatial: Integrate High-Performance Spatial Computation with Your Existing Workflow [S51243]
ML and Recommender Systems
Using GPU-Optimized Software to Shorten the Feedback Loop in AML and Fraud Models by an Order of Magnitude [S51632]
Using GNNs in LinkedIn Recommendation Systems [S51400]
Merlin Updates - Build and Deploy Recommender Systems at Any Scale [S51335]
Graph and Operations
Accelerating Huge Graph GNN Training using DGL and PyG with Integrated Containers [S51156]
Batched Graph Community Detection on NVIDIA DGX Platforms [PS51057]
Using GNNs in LinkedIn Recommendation Systems [S51400]
Advances in Operations Optimization [S51717]
How to Get Started with RAPIDS
A Variety of Ways to Get Up & Running
More about RAPIDS Self-Start Resources Discussion & Support
● Learn more at RAPIDS.ai
● Read the API docs
● Check out the RAPIDS blog
● Read the NVIDIA DevBlog
● Get started with RAPIDS
● Deploy on the Cloud today
● Start with Google Colab
● Look at the cheat sheets
● Check the RAPIDS GitHub
● Use the NVIDIA Forums
● Reach out on Slack
● Talk to NVIDIA Services
@RAPIDSai
https://github.com/rapidsai https://rapids.ai/slack-invite/ https://rapids.ai
Get Engaged
NVIDIA Launchpad
Instantly experience end-to-end workflows for AI, data science, 3D design collaboration, and more
Get Started with Launchpad
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

More Related Content

Similar to S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Matej Misik
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
Jack (Jaegeun) Han
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
iguazio
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Altoros
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
Keith Kraus
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
Frédéric Parienté
 
BlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoBlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow Demo
Rodrigo Aramburu
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systems
Ganesan Narayanasamy
 

Similar to S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf (20)

Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
BlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoBlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow Demo
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systems
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 

S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

  • 1. Accelerate Data Science in Python with RAPIDS John Zedlewski, Senior Director, RAPIDS and Data Science @ NVIDIA Ashwin Srinath, Senior Engineer, RAPIDS @NVIDIA GTC 2023
  • 2. RAPIDS brings GPU acceleration to the open-source data science and data engineering ecosystem
  • 3. General Purpose and Domain-Specific Libraries Data Preparation/ETL Visualization Analytics/ML/Graph cuDF ● GPU-accelerated ETL functions ● Tracks Pandas and other common PyData APIs ● Dask + UCX integration for scaling RAPIDS for Apache Spark ● RAPIDS accelerator for Apache Spark RAPIDS ML ● GPU-native cuML library (scikit- learn-style APIs) XGBoost, RAFT, FIL, HPO, DL interop, and more cuGraph ● GPU graph analytics, including Louvain, PageRank, and more ● Multi-Node Multi-GPU features cuxfilter ● GPU-accelerated cross-filtering Viz integration ● pyViz: Plotly Dash, Bokeh, Datashader, HoloViews, hvPlot ● Node-RAPIDS bindings for node.js Morpheus Cybersecurity application development framework cuSignal Signals processing cuSpatial Spatial analytics Merlin Recommender Systems development framework cuCIM Computer vision & image processing primitives NVTabular Tabular data feature engineering Application and Domain-Specific Frameworks
  • 4. cuDF A GPU DataFrame library in Python with a pandas-like API built into the PyData ecosystem Pandas-like API on the GPU Best-in-Class Performance (Benchmark) >>> import pandas as pd >>> df = pd.read_csv("filepath") >>> df.groupby(“col”).mean() >>> df.rolling(window=3).sum() >>> import cudf >>> df = cudf.read_csv("filepath") >>> df.groupby(“col”).mean() >>> df.rolling(window=3).sum() GPU CPU pandas cuDF Average Speed-Ups: 10-100x 10 Minutes to cuDF Groupby Time Series Strings and Regex Missing Data Indexing Nested Types Rolling Windows CuPy Interoperability UDFs NVIDIA A100 vs. AMD EPYC 7642 48-Core Processor cuDF Python vs. Pandas 1.4 Performance maximized on large in- memory datasets
  • 5. Let’s Code: Loading and Preparing Data Get the notebook at rapids.ai/introgtc2023
  • 6. cuDF: Advanced Features I/O and Interoperability High-performance IO ▸ CUDA-accelerated readers and writers for CSV, JSON, Parquet, ORC, Avro, and plain text ▸ GPU-direct storage to bypass PCI bottlenecks ▸ On-GPU compression / decompression with nvComp Support for text data on GPU ▸ Standard Pandas-style string functions and regular expressions, accelerated in CUDA ▸ Advanced parsers and tokenizers for deep learning and NLP, such as Byte-Pair Encoding Complex Datatypes ▸ Struct, List, and Decimal128 columns – often found in enterprise datasets but not in core Pandas Interoperability ▸ Zero-copy data passing to cuPy, Pytorch, and more via dlpack and __cuda_array_interface__ nelem = 10000 df = cudf.DataFrame({ 'a':range(nelem), 'b':range(500, nelem + 500), 'c':range(1000, nelem + 1000)} ) # Convert to cupy arr_cupy = df.to_cupy() # Convert from PyTorch import torch From torch.utils import dlpack data = torch.randn(40000).cuda() Df = cudf.from_dlpack( dlpack.to_dlpack(data))
  • 7. cuDF: User-defined functions ▸ RAPIDS leverages Numba to compile a wide range of your user-defined functions to CUDA code ▸ Numba has explicit CUDA JIT support, but it is automatically applied and optimized in key RAPIDS locations – totally transparently ▸ Can be used in: * apply on Series and DataFrames * Rolling windows for time series (.rolling) * apply_grouped for aggregations * On strings in many cases (newly added!) Bringing all your Python code to CUDA >> sr = cudf.Series([-1, 1, 2, 3]) # Explicit Numba @cuda.jit def rectified_linear(x): if x < 0: return 0 elif x < 1: return x else: return 1 >> rectified_linear(sr) # Automatic Numba with a lambda >> sr.apply(lambda x: math.log(x+1))
  • 8. Let’s Code: Working with UDFs and Feature Engineering Get the notebook at rapids.ai/introgtc2023
  • 9. cuML Accelerated Machine Learning with a scikit-learn API >>> from sklearn.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) >>> from cuml.ensemble import RandomForestClassifier >>> clf = RandomForestClassifier() >>> clf.fit(x, y) GPU CPU Scikit-learn cuML 40+ GPU-Accelerated Algorithms & Growing Time Series Preprocessing Classification Tree Models Cross Validation Clustering Explainability Dimensionality Reduction Regression A100 GPU vs. 2x Intel Xeon E5-2698 CPUs (80 logical cores) cuML 23.02, scikit-learn 1.2, umap-learn 0.5.3 Performance maximized on large in-memory datasets
  • 10. ● One line of code change to unlock up to 20x speedups with GPUs ● Scalable to the world’s largest datasets with Dask and PySpark ● Built-in SHAP support for model explainability ● Deployable with Triton for lighting-fast inference in production ● Triton supports LightGBM and Random Forests as well as XGBoost for inference Accelerated XGBoost and Inference for Trees “XGBoost is All You Need” – Bojan Tunguz, 4x Kaggle Grandmaster >>> from xgboost import XGBClassifier >>> clf = XGBClassifier() >>> clf.fit(x, y) >>> from xgboost import XGBClassifier >>> clf = XGBClassifier(tree_method=”gpu_hist”) >>> clf.fit(x, y) GPU CPU XGBoost XGBoost Up to 20x Speedups
  • 11. Rapids Visualization Scalable graphics with user-friendly interfaces Leverage cuDF speedups to visualize, filter, and analzye data frames fast with popular library integrations and the RAPIDS-native cuXfilter and node-RAPIDS
  • 12. Let’s code: Intro to ML (plus a little visualization) Get the notebook at rapids.ai/introgtc2023
  • 13. • cuGraph is a library of graph algorithms capable of processing the world’s largest graphs • Link analysis, community detection, centrality, linear assignment, property graphs, and more • Friendly, consistent C, C++17, and Python APIs compatible with NetworkX • World-class performance for every scale and use case • Support for trillion+ edge graphs • Graph neural network library integration • Graph database integration cuGraph Making Large-Scale Graph Analytics Possible PageRank on 4.4 trillion edges at 1.5 seconds per iteration Each node has eight A100 80GB GPUs, InfiniBand for inter-node communication, and NVLink for intra-node communication
  • 14. Scaling with RAPIDS + Dask ▸ Distributed extensions with familiar APIs for DataFrames and Arrays ▸ Scale from laptop to supercomputer scales – tested up to 1024 GPUs ▸ Deploy on any cloud service or Kubernetes in minutes ▸ Integrate cuDF, cuPy, cuML, cuGraph, and XGBoost with a common framework ▸ Easily switch between CPU and GPU backends Easy Multi-GPU for Python Programmers # Start by telling Dask to use the GPU backend with dask.config.set({“dataframe.backend”: “cudf”}): ddf_s = dd.read_parquet(‘stores.parquet’) ddf_p = dd.read_parquet(“purchases.parquet”) ddf_p[“total”] = ddf_p.price * ddf_p.quantity # Combine the two dataframes ddf_join = ddf_p.merge(res, on=["id"], how="inner") ddf_join = ddf_join.set_index("key")
  • 15. RAPIDS Accelerator for Apache Spark Seamless integration with Apache Spark 3.x spark.sql(""" select order count(*) as order_count from orders""" ) spark.conf.set("spark.plugins", "com.nvidia.spark.SQLPlugin") spark.sql(""" select order count(*) as order_count from orders""" ) CPU Spark GPU Spark Average Speed-Ups: 10x ~5x faster than CPU-based servers 78% cheaper than CPU-based servers *CPU-only 4-node cluster: 4xn1-standard-32 (32vCPU, 120GB RAM) *GPU 4-node cluster: 4xn1-standard-32 (32vCPU, 120GB RAM) and 8xT4 NVIDIA GPU *NDS stands for NVIDIA Decision Support benchmark that is derived from the TPC-DS benchmark and is used for internal testing. Results from NDS are not comparable to TPC-DS
  • 16. 350+ RAPIDS contributors on GitHub Powering Modern Data Teams Battle tested on the most challenging workloads, integrated with the most innovative tools, and backed by a huge community 100+ Open-source and commercial software integrations 25% of Fortune 100 companies using RAPIDS
  • 17. Deploying RAPIDS Running RAPIDS anywhere (https://docs.rapids.ai/deployment) RAPIDS Deployment Documentation
  • 18. More on RAPIDS at GTC 2023 General RAPIDS and Data Science Accelerate Spark With RAPIDS For Cost Savings [S52202] Accelerating Your Prototypes with NVIDIA RAPIDS and Friends* [DLIT51679] (DLI) Learn How to Create Features from Tabular Data and Accelerate your Data Science Pipeline* [DLIT51195] Accelerating Exploratory Data Analysis at LinkedIn [S51399] cuSpatial: Integrate High-Performance Spatial Computation with Your Existing Workflow [S51243] ML and Recommender Systems Using GPU-Optimized Software to Shorten the Feedback Loop in AML and Fraud Models by an Order of Magnitude [S51632] Using GNNs in LinkedIn Recommendation Systems [S51400] Merlin Updates - Build and Deploy Recommender Systems at Any Scale [S51335] Graph and Operations Accelerating Huge Graph GNN Training using DGL and PyG with Integrated Containers [S51156] Batched Graph Community Detection on NVIDIA DGX Platforms [PS51057] Using GNNs in LinkedIn Recommendation Systems [S51400] Advances in Operations Optimization [S51717]
  • 19. How to Get Started with RAPIDS A Variety of Ways to Get Up & Running More about RAPIDS Self-Start Resources Discussion & Support ● Learn more at RAPIDS.ai ● Read the API docs ● Check out the RAPIDS blog ● Read the NVIDIA DevBlog ● Get started with RAPIDS ● Deploy on the Cloud today ● Start with Google Colab ● Look at the cheat sheets ● Check the RAPIDS GitHub ● Use the NVIDIA Forums ● Reach out on Slack ● Talk to NVIDIA Services @RAPIDSai https://github.com/rapidsai https://rapids.ai/slack-invite/ https://rapids.ai Get Engaged
  • 20. NVIDIA Launchpad Instantly experience end-to-end workflows for AI, data science, 3D design collaboration, and more Get Started with Launchpad