SlideShare a Scribd company logo
1 of 52
Download to read offline
Deep Dive into GPU Support in
Apache Spark 3.x
Robert Evans and Jason Lowe
NVIDIA
Agenda
GPU Features in Apache Spark 3
Accelerated SQL/DataFrame
Accelerated Shuffle
What’s Next
GPU Features in Apache Spark 3
Accelerator-Aware Scheduling
▪ SPARK-24615
▪ Request resources
▪ Executor
▪ Driver
▪ Task
▪ Resource discovery
▪ API to determine assignment
▪ Supported on YARN, Kubernetes, and Standalone
GPUs are now a schedulable resource
GPU Scheduling Example
./bin/spark-shell --master yarn --executor-cores 2 
--conf spark.driver.resource.gpu.amount=1 
--conf spark.driver.resource.gpu.discoveryScript=/opt/spark/getGpuResources.sh 
--conf spark.executor.resource.gpu.amount=2 
--conf spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh 
--conf spark.task.resource.gpu.amount=1 
--files examples/src/main/scripts/getGpusResources.sh
GPU Discovery Script Example
#!/bin/bash
#
# Outputs a JSON formatted string that is expected by the
# spark.{driver/executor}.resource.gpu.discoveryScript config.
#
# Example output: {"name": "gpu", "addresses":["0","1","2","3","4","5","6","7"]}
ADDRS=$(nvidia-smi --query-gpu=index --format=csv,noheader 
| sed -e :a -e N -e'$!ba' -e 's/n/","/g')
echo {"name": "gpu", "addresses":["$ADDRS"]}
GPU Assignments API
// Task API
val context = TaskContext.get()
val resources = context.resources()
val assignedGpuAddrs = resources("gpu").addresses
// Pass assignedGpuAddrs into TensorFlow or other AI code
// Driver API
scala> sc.resources("gpu").addresses
Array[String] = Array(0)
GPU Scheduling UI
Stage Level Scheduling
CPU
NODE
GPU
SPARK ML APPLICATION
ETL Stage ML Stage
CPU
NODE
Stage Level Scheduling
▪ SPARK-27495
▪ Specify resource requirements per RDD operation
▪ Spark dynamically allocates containers to meet resource requirements
▪ Spark schedules tasks on appropriate containers
▪ Coming soon in Spark 3.1
SQL Columnar Processing
▪ SPARK-27396
▪ Catalyst API for columnar processing
▪ Plugins can modify query plan with columnar operations
▪ Plan nodes can exchange RDD[ColumnarBatch] instead of RDD[Row]
▪ Enables efficient processing by vectorized accelerators
▪ SIMD
▪ FPGA
▪ GPU
Spark 3
Spark 3 with Project Hydrogen
Spark 2.x
Data Preparation Model Training
Shared Storage
CPU Powered Cluster GPU Powered Cluster
Data
Sources
Spark
XGBoost | TensorFlow |
PyTorch
Spark Orchestrated Data
Sources
GPU Powered Cluster
Data Preparation Model Training
Spark
XGBoost | TensorFlow |
PyTorch
Spark Orchestrated
Spark 3 with Project Hydrogen
▪ Single pipeline
▪ Ingest
▪ Data preparation
▪ Model Training
▪ Infrastructure is consolidated and
simplified
▪ ETL can be GPU-accelerated
Enabling end-to-end acceleration
Data
Sources
GPU Powered Cluster
Data Preparation Model Training
Spark
XGBoost | TensorFlow |
PyTorch
Spark Orchestrated
Spark 3
Accelerated SQL/DataFrame
Accelerated ETL?
Can a GPU make an elephant fast?
Yes
TPCx-BB Like Benchmark Results (10TB Dataset, Two Node DGX-2 Cluster)*
Query #5 Query #16 Query #21 Query #22
CPU 25.95 6.16 7.13 3.80
GPU 1.31 1.16 0.56 0.14
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Time(mins)
Query Time: GPU vs CPU (Mins)
Environment: Two DGX-2 (96 CPU Cores, 1.5TB Host memory, 16 V100 GPUs, 512 GB GPU Memory)
* Not official or complete TPCx-BB runs (ETL power only).
Deep Learning Recommendation Machines
▪ Anonymized 7-day clickstream (1 TB)
▪ Convert high-cardinality strings to
contiguous integer IDs
▪ DLRM github repo has turnkey scripts
Example use case: Criteo Dataset
DLRM on Criteo Dataset (Past)
144.0
12.1
45.0
0.7
0.0
40.0
80.0
120.0
ETL (1 core CPU)* Spark ETL (96 core CPU) Training (96 core CPU) Training (1 - V100)
Time(Hours)
ETL & Training Run Time for CPU & GPU
Criteo Dataset (1TB)
12.1
2.3
0.5
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
Spark ETL (96 core CPU) Spark ETL (1 - V100) Spark ETL (8 - V100)
Time(Hours)
Spark ETL for CRITEO DATASET (1TB)
DLRM ETL on Criteo Dataset (Present)
DLRM End-to-End on Criteo Dataset (Present)
Original CPU (1 Core for
ETL, 96 Core CPU for
Training)
Spark CPU (96 Core for
ETL & Training)
Spark CPU (96 Core for
ETL) & Spark GPU (1-
V100 Training)
Spark GPU (8-V100 for
ETL & 1-V100 Training)
Training 45.0 45.0 0.7 0.7
ETL 144.0 12.1 12.1 0.5
144.0
12.1 12.1
0.5
45.0
45.0 0.7
0.7
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
180.0
Time(Hours) Spark ETL + Training for Criteo Dataset (1TB)
Jensen Huang
GPU Technology Conference 2020
"The more you buy, the more you
save."
RAPIDS Accelerator for Apache Spark (Plugin)
RAPIDS Accelerator
for Apache Spark
UCX LibrariesRAPIDS C++ Libraries
JNI bindings
Mapping From Java/Scala to C++
DISTRIBUTED SCALE-OUT SPARK APPLICATIONS
APACHE SPARK CORE
Spark SQL Spark ShuffleDataFrame
if gpu_enabled(op, data_type)
call-out to RAPIDS
else
execute standard Spark op
● Custom Implementation of
Spark Shuffle
● Optimized to use RDMA
and GPU-to-GPU direct
communication
CUDA
JNI bindings
Mapping From Java/Scala to C++
RAPIDS Accelerator for Apache Spark 3.0 Plugin
No Code Changes
Same SQL and DataFrame code
What We Support
and growing…
!
%
&
*
+
-
/
<
<=
<=>
=
==
>
>=
^
abs
acos
and
asin
atan
avg
bigint
boolean
cast
cbrt
ceil
ceiling
coalesce
concat
cos
cosh
cot
count
cube
current_date
current_timestamp
date
datediff
day
dayofmonth
degrees
double
e
exp
explode*
expm1
first
first_value
float
floor
from_unixtime
hour
if
ifnull
in
initcap
input_file_block_le
ngth
input_file_block_st
art
input_file_name
int
isnan
isnotnull
isnull
last
last_value
length
lcase
like
ln
locate
log
log10
log1p
log2
lower
ltrim
max
mean
min
minute
mod
monotonically_inc
reasing_id
month
nanvl
negative
not
now
nullif
nvl
nvl2
or
pi
posexplode*
position
pow
power
radians
rand*
regexp_replace*
replace
rint
rollup
row_number
rtrim
second
shiftleft
shiftright
shiftrightunsigned
sign
signum
sin
sinh
smallint
spark_partition_id
sqrt
string
substr
substring
sum
tan
tanh
timestamp
tinyint
trim
ucase
upper
when
window
year
|
~
CSV Reading*
Orc Reading
Orc Writing
Parquet Reading
Parquet Writing
ANSI casts
TimeSub for time
ranges
startswith
endswith
contains
limit
order by
group by
filter
union
repartition
equi-joins
select
Is This a Silver Bullet?
No
▪ Small amounts of data
▪ Few hundred MB per partition for GPU
▪ Cache coherent processing
▪ Data Movement
▪ Slow I/O (networking, disks, etc.)
▪ Going back and forth to the CPU (UDFs)
▪ Shuffle
▪ Limited GPU Memory
160
550
1250
3500
12288
24576 25600
46080
307200
1048576
Spinning
D
isk
SSD
10G
igE
N
VM
EPC
Ie
gen3
PC
Ie
gen4
D
DR
4-3200
D
IM
M
TypicalPC
R
AM
N
VLink
C
PU
Cache
MB/s(LogScale)
But It Can Be Amazing
▪ High cardinality data
▪ Joins
▪ Aggregates
▪ Sort
▪ Window operations
▪ Especially on large windows
▪ Aggregate with lots of distinct operations
▪ Complicated processing
▪ Transcoding
▪ Encoding and compressing Parquet and ORC is expensive
▪ Parsing CSV is expensive
What the SQL plugin excels at
How Does It Work
Spark SQL & DataFrame Compilation Flow
DataFrame
Logical Plan
Physical Plan
RDD[InternalRow]
bar.groupBy(
col(”product_id”),
col(“ds”))
.agg(
max(col(“price”)) -
min(col(“price”)).alias(“range”))
SELECT product_id, ds,
max(price) – min(price) AS
range FROM bar GROUP BY
product_id, ds
QUERY
CPUPHYSICALPLAN
Spark SQL & DataFrame Compilation Flow
DataFrame
Logical Plan
Physical Plan
RDD[InternalRow]
bar.groupBy(
col(”product_id”),
col(“ds”))
.agg(
max(col(“price”)) -
min(col(“price”)).alias(“range”))
SELECT product_id, ds,
max(price) – min(price) AS
range FROM bar GROUP BY
product_id, ds
QUERY
GPUPHYSICALPLAN
Physical Plan
RDD[ColumnarBatch]
Spark SQL & DataFrame Compilation FlowCPUPHYSICALPLAN
Read Parquet File
First Stage
Aggregate
Shuffle Exchange
Second Stage
Aggregate
Write Parquet File
Combine Shuffle
Data
Read Parquet File
First Stage
Aggregate
Shuffle Exchange
Second Stage
Aggregate
Write Parquet File
Convert to Row
Format
Convert to Row
Format
GPUPHYSICALPLAN
ETL Technology Stack
Dask cuDF
cuDF, Pandas
Python
Cython
cuDF C++
CUDA Libraries
CUDA
Java
JNI bindings
Spark DataFrame,
Scala, PySpark
Demo
Demo Cluster Setup
CPU
Driver:
▪ 1 - r4.xlarge
▪ 30.5GB Memory
▪ 4 Cores
▪ 1 DBU
Workers:
▪ 12 - r4.2xlarge
▪ 61GB Memory
▪ 8 cores
▪ 2 DBU
Databricks (AWS)
GPU
Driver:
▪ 1 - p2.xlarge
▪ 61GB Memory
▪ 4 cores
▪ 1 - K80 (Not needed)
▪ 1.22 DBU
Workers:
▪ 12 – p3.2xlarge
▪ 61GB Memory
▪ 1 - V100
▪ 8 cores
▪ 4.15 DBU
Databricks Demo Results
“The more you buy, the more you save” – Jensen H Huang, CEO NVIDIA
1,736
423
0
350
700
1,050
1,400
1,750
CPU (12 - r4.2xlarge) GPU (12 - p3.2xlarge)
ETL Time (seconds)
4x Speed-up $8.03
$6.81
$0.0
$2.0
$4.0
$6.0
$8.0
$10.0
CPU (12- r4.2xlarge) GPU (12 - p3.2xlarge)
ETL Cost (AWS+DBU)
18% Cost Savings*
* Costs based on Databricks Standard edition
T4 Cluster Setup
EC2
V100 is optimized for ML/DL training
T4 fits better with SQL processing
Driver (Ran on one of the worker nodes)
Workers:
▪ 12 – g4dn.2xlarge
▪ 32GB Memory
▪ 1 - T4
▪ 8 cores
Coming Soon….T4 GPUs on Databricks
Same speed-up as V100 but more savings
1,736
457
0
350
700
1,050
1,400
1,750
CPU (12 - r4.2xlarge) GPU (12 - g4dn.2xlarge)
ETL Time (seconds)
3.8x Speed-up
$8.03
$3.76
$0.0
$2.0
$4.0
$6.0
$8.0
$10.0
CPU (12- r4.2xlarge) GPU (12 - g4dn.2xlarge)
ETL Cost (AWS+DBU)
50% Cost Savings*
* Costs based on AWS T4 GPU instance market price & V100 GPU price on Databricks Standard edition
RAPIDS Accelerator on AWS
▪ ~3.5x Speed-up
▪ ~40% Cost Savings
Based on TPCx-BB like Queries #5 & #22 with 1TB scale factor input
221
82.68
61
26.83
0
50
100
150
200
250
Q5 Q22
ETL Time (Seconds)
CPU: 12 - m5dn.2xlarge
(8-core 32GB)
GPU: 12 - g4dn.2xlarge
(8-core 32GB 1xT4 GPU)
Accelerated Shuffle
Spark Shuffle
Data exchange between stages
Task 1Task 0 Task 2
Task 1Task 0
Stage 1
Stage 2
Spark Shuffle
CPU-centric data movement
PCI-e Bus
Local
Storage
NetworkGPU 1
CPU
GPU 0
Accelerated Shuffle
GPU-centric data movement
PCI-e Bus
Local
Storage
NetworkGPU 1
CPU
GPU 0
NVLink
RDMA
GPU Direct
Storage
Accelerated Shuffle
Shuffling Spilled Data
PCI-e Bus
Local
Storage
NetworkGPU 1 GPU 0
RDMA
CPU
Host
Memory
UCX Library
Unified Communication X
▪ Abstracts communication transports
▪ Selects best route(s)
▪ TCP
▪ RDMA
▪ Shared Memory
▪ CUDA IPC
▪ Zero-copy GPU transfers over RDMA
▪ RDMA requires network support
▪ Infiniband
▪ RoCE
▪ http://openucx.org
Accelerated Shuffle Results
Inventory pricing query
228
45
8.4
0
50
100
150
200
250
CPU GPU GPU+UCX
QueryDurationInSeconds
Accelerated Shuffle Results
ETL for logistical regression model
1556
172
79
0
400
800
1200
1600
CPU GPU GPU+UCX
QueryDurationinSeconds
What’s Next?
What’s Next
▪ Open Source (DONE)
▪ https://github.com/NVIDIA/spark-rapids
▪ https://nvidia.github.io/spark-rapids/
▪ Nested types
▪ Arrays
▪ Structs
▪ Maps
▪ Decimal type
▪ More operators
▪ GPU Direct Storage
▪ Time zone support for timestamps
▪ Only UTC supported now
▪ Higher order functions
▪ UDFs
Further OutComing Soon
Where to Get More Information
▪ https://NVIDIA.com/Spark
▪ Please use the “Contact Us” link to get in touch with NVIDIA’s Spark team
▪ https://github.com/NVIDIA/spark-rapids
▪ https://nvidia.github.io/spark-rapids/
▪ Listen to Adobe’s Email Marketing Intelligent Services Use-Case
▪ Free e-book at NVIDIA.com/Spark-Book
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
Deep Dive into GPU Support in Apache Spark 3.x

More Related Content

What's hot

What's hot (20)

Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0
 
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
 

Similar to Deep Dive into GPU Support in Apache Spark 3.x

Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Databricks
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Databricks
 

Similar to Deep Dive into GPU Support in Apache Spark 3.x (20)

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache Spark
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 

Deep Dive into GPU Support in Apache Spark 3.x

  • 1.
  • 2. Deep Dive into GPU Support in Apache Spark 3.x Robert Evans and Jason Lowe NVIDIA
  • 3. Agenda GPU Features in Apache Spark 3 Accelerated SQL/DataFrame Accelerated Shuffle What’s Next
  • 4. GPU Features in Apache Spark 3
  • 5. Accelerator-Aware Scheduling ▪ SPARK-24615 ▪ Request resources ▪ Executor ▪ Driver ▪ Task ▪ Resource discovery ▪ API to determine assignment ▪ Supported on YARN, Kubernetes, and Standalone GPUs are now a schedulable resource
  • 6. GPU Scheduling Example ./bin/spark-shell --master yarn --executor-cores 2 --conf spark.driver.resource.gpu.amount=1 --conf spark.driver.resource.gpu.discoveryScript=/opt/spark/getGpuResources.sh --conf spark.executor.resource.gpu.amount=2 --conf spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh --conf spark.task.resource.gpu.amount=1 --files examples/src/main/scripts/getGpusResources.sh
  • 7. GPU Discovery Script Example #!/bin/bash # # Outputs a JSON formatted string that is expected by the # spark.{driver/executor}.resource.gpu.discoveryScript config. # # Example output: {"name": "gpu", "addresses":["0","1","2","3","4","5","6","7"]} ADDRS=$(nvidia-smi --query-gpu=index --format=csv,noheader | sed -e :a -e N -e'$!ba' -e 's/n/","/g') echo {"name": "gpu", "addresses":["$ADDRS"]}
  • 8. GPU Assignments API // Task API val context = TaskContext.get() val resources = context.resources() val assignedGpuAddrs = resources("gpu").addresses // Pass assignedGpuAddrs into TensorFlow or other AI code // Driver API scala> sc.resources("gpu").addresses Array[String] = Array(0)
  • 10. Stage Level Scheduling CPU NODE GPU SPARK ML APPLICATION ETL Stage ML Stage CPU NODE
  • 11. Stage Level Scheduling ▪ SPARK-27495 ▪ Specify resource requirements per RDD operation ▪ Spark dynamically allocates containers to meet resource requirements ▪ Spark schedules tasks on appropriate containers ▪ Coming soon in Spark 3.1
  • 12. SQL Columnar Processing ▪ SPARK-27396 ▪ Catalyst API for columnar processing ▪ Plugins can modify query plan with columnar operations ▪ Plan nodes can exchange RDD[ColumnarBatch] instead of RDD[Row] ▪ Enables efficient processing by vectorized accelerators ▪ SIMD ▪ FPGA ▪ GPU
  • 13. Spark 3 Spark 3 with Project Hydrogen Spark 2.x Data Preparation Model Training Shared Storage CPU Powered Cluster GPU Powered Cluster Data Sources Spark XGBoost | TensorFlow | PyTorch Spark Orchestrated Data Sources GPU Powered Cluster Data Preparation Model Training Spark XGBoost | TensorFlow | PyTorch Spark Orchestrated
  • 14. Spark 3 with Project Hydrogen ▪ Single pipeline ▪ Ingest ▪ Data preparation ▪ Model Training ▪ Infrastructure is consolidated and simplified ▪ ETL can be GPU-accelerated Enabling end-to-end acceleration Data Sources GPU Powered Cluster Data Preparation Model Training Spark XGBoost | TensorFlow | PyTorch Spark Orchestrated Spark 3
  • 16. Accelerated ETL? Can a GPU make an elephant fast?
  • 17. Yes TPCx-BB Like Benchmark Results (10TB Dataset, Two Node DGX-2 Cluster)* Query #5 Query #16 Query #21 Query #22 CPU 25.95 6.16 7.13 3.80 GPU 1.31 1.16 0.56 0.14 0.00 5.00 10.00 15.00 20.00 25.00 30.00 Time(mins) Query Time: GPU vs CPU (Mins) Environment: Two DGX-2 (96 CPU Cores, 1.5TB Host memory, 16 V100 GPUs, 512 GB GPU Memory) * Not official or complete TPCx-BB runs (ETL power only).
  • 18. Deep Learning Recommendation Machines ▪ Anonymized 7-day clickstream (1 TB) ▪ Convert high-cardinality strings to contiguous integer IDs ▪ DLRM github repo has turnkey scripts Example use case: Criteo Dataset
  • 19. DLRM on Criteo Dataset (Past) 144.0 12.1 45.0 0.7 0.0 40.0 80.0 120.0 ETL (1 core CPU)* Spark ETL (96 core CPU) Training (96 core CPU) Training (1 - V100) Time(Hours) ETL & Training Run Time for CPU & GPU Criteo Dataset (1TB)
  • 20. 12.1 2.3 0.5 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 Spark ETL (96 core CPU) Spark ETL (1 - V100) Spark ETL (8 - V100) Time(Hours) Spark ETL for CRITEO DATASET (1TB) DLRM ETL on Criteo Dataset (Present)
  • 21. DLRM End-to-End on Criteo Dataset (Present) Original CPU (1 Core for ETL, 96 Core CPU for Training) Spark CPU (96 Core for ETL & Training) Spark CPU (96 Core for ETL) & Spark GPU (1- V100 Training) Spark GPU (8-V100 for ETL & 1-V100 Training) Training 45.0 45.0 0.7 0.7 ETL 144.0 12.1 12.1 0.5 144.0 12.1 12.1 0.5 45.0 45.0 0.7 0.7 0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 Time(Hours) Spark ETL + Training for Criteo Dataset (1TB)
  • 22. Jensen Huang GPU Technology Conference 2020 "The more you buy, the more you save."
  • 23. RAPIDS Accelerator for Apache Spark (Plugin) RAPIDS Accelerator for Apache Spark UCX LibrariesRAPIDS C++ Libraries JNI bindings Mapping From Java/Scala to C++ DISTRIBUTED SCALE-OUT SPARK APPLICATIONS APACHE SPARK CORE Spark SQL Spark ShuffleDataFrame if gpu_enabled(op, data_type) call-out to RAPIDS else execute standard Spark op ● Custom Implementation of Spark Shuffle ● Optimized to use RDMA and GPU-to-GPU direct communication CUDA JNI bindings Mapping From Java/Scala to C++
  • 24. RAPIDS Accelerator for Apache Spark 3.0 Plugin
  • 25. No Code Changes Same SQL and DataFrame code
  • 26. What We Support and growing… ! % & * + - / < <= <=> = == > >= ^ abs acos and asin atan avg bigint boolean cast cbrt ceil ceiling coalesce concat cos cosh cot count cube current_date current_timestamp date datediff day dayofmonth degrees double e exp explode* expm1 first first_value float floor from_unixtime hour if ifnull in initcap input_file_block_le ngth input_file_block_st art input_file_name int isnan isnotnull isnull last last_value length lcase like ln locate log log10 log1p log2 lower ltrim max mean min minute mod monotonically_inc reasing_id month nanvl negative not now nullif nvl nvl2 or pi posexplode* position pow power radians rand* regexp_replace* replace rint rollup row_number rtrim second shiftleft shiftright shiftrightunsigned sign signum sin sinh smallint spark_partition_id sqrt string substr substring sum tan tanh timestamp tinyint trim ucase upper when window year | ~ CSV Reading* Orc Reading Orc Writing Parquet Reading Parquet Writing ANSI casts TimeSub for time ranges startswith endswith contains limit order by group by filter union repartition equi-joins select
  • 27. Is This a Silver Bullet? No ▪ Small amounts of data ▪ Few hundred MB per partition for GPU ▪ Cache coherent processing ▪ Data Movement ▪ Slow I/O (networking, disks, etc.) ▪ Going back and forth to the CPU (UDFs) ▪ Shuffle ▪ Limited GPU Memory 160 550 1250 3500 12288 24576 25600 46080 307200 1048576 Spinning D isk SSD 10G igE N VM EPC Ie gen3 PC Ie gen4 D DR 4-3200 D IM M TypicalPC R AM N VLink C PU Cache MB/s(LogScale)
  • 28. But It Can Be Amazing ▪ High cardinality data ▪ Joins ▪ Aggregates ▪ Sort ▪ Window operations ▪ Especially on large windows ▪ Aggregate with lots of distinct operations ▪ Complicated processing ▪ Transcoding ▪ Encoding and compressing Parquet and ORC is expensive ▪ Parsing CSV is expensive What the SQL plugin excels at
  • 29. How Does It Work
  • 30. Spark SQL & DataFrame Compilation Flow DataFrame Logical Plan Physical Plan RDD[InternalRow] bar.groupBy( col(”product_id”), col(“ds”)) .agg( max(col(“price”)) - min(col(“price”)).alias(“range”)) SELECT product_id, ds, max(price) – min(price) AS range FROM bar GROUP BY product_id, ds QUERY CPUPHYSICALPLAN
  • 31. Spark SQL & DataFrame Compilation Flow DataFrame Logical Plan Physical Plan RDD[InternalRow] bar.groupBy( col(”product_id”), col(“ds”)) .agg( max(col(“price”)) - min(col(“price”)).alias(“range”)) SELECT product_id, ds, max(price) – min(price) AS range FROM bar GROUP BY product_id, ds QUERY GPUPHYSICALPLAN Physical Plan RDD[ColumnarBatch]
  • 32. Spark SQL & DataFrame Compilation FlowCPUPHYSICALPLAN Read Parquet File First Stage Aggregate Shuffle Exchange Second Stage Aggregate Write Parquet File Combine Shuffle Data Read Parquet File First Stage Aggregate Shuffle Exchange Second Stage Aggregate Write Parquet File Convert to Row Format Convert to Row Format GPUPHYSICALPLAN
  • 33. ETL Technology Stack Dask cuDF cuDF, Pandas Python Cython cuDF C++ CUDA Libraries CUDA Java JNI bindings Spark DataFrame, Scala, PySpark
  • 34. Demo
  • 35. Demo Cluster Setup CPU Driver: ▪ 1 - r4.xlarge ▪ 30.5GB Memory ▪ 4 Cores ▪ 1 DBU Workers: ▪ 12 - r4.2xlarge ▪ 61GB Memory ▪ 8 cores ▪ 2 DBU Databricks (AWS) GPU Driver: ▪ 1 - p2.xlarge ▪ 61GB Memory ▪ 4 cores ▪ 1 - K80 (Not needed) ▪ 1.22 DBU Workers: ▪ 12 – p3.2xlarge ▪ 61GB Memory ▪ 1 - V100 ▪ 8 cores ▪ 4.15 DBU
  • 36. Databricks Demo Results “The more you buy, the more you save” – Jensen H Huang, CEO NVIDIA 1,736 423 0 350 700 1,050 1,400 1,750 CPU (12 - r4.2xlarge) GPU (12 - p3.2xlarge) ETL Time (seconds) 4x Speed-up $8.03 $6.81 $0.0 $2.0 $4.0 $6.0 $8.0 $10.0 CPU (12- r4.2xlarge) GPU (12 - p3.2xlarge) ETL Cost (AWS+DBU) 18% Cost Savings* * Costs based on Databricks Standard edition
  • 37. T4 Cluster Setup EC2 V100 is optimized for ML/DL training T4 fits better with SQL processing Driver (Ran on one of the worker nodes) Workers: ▪ 12 – g4dn.2xlarge ▪ 32GB Memory ▪ 1 - T4 ▪ 8 cores
  • 38. Coming Soon….T4 GPUs on Databricks Same speed-up as V100 but more savings 1,736 457 0 350 700 1,050 1,400 1,750 CPU (12 - r4.2xlarge) GPU (12 - g4dn.2xlarge) ETL Time (seconds) 3.8x Speed-up $8.03 $3.76 $0.0 $2.0 $4.0 $6.0 $8.0 $10.0 CPU (12- r4.2xlarge) GPU (12 - g4dn.2xlarge) ETL Cost (AWS+DBU) 50% Cost Savings* * Costs based on AWS T4 GPU instance market price & V100 GPU price on Databricks Standard edition
  • 39. RAPIDS Accelerator on AWS ▪ ~3.5x Speed-up ▪ ~40% Cost Savings Based on TPCx-BB like Queries #5 & #22 with 1TB scale factor input 221 82.68 61 26.83 0 50 100 150 200 250 Q5 Q22 ETL Time (Seconds) CPU: 12 - m5dn.2xlarge (8-core 32GB) GPU: 12 - g4dn.2xlarge (8-core 32GB 1xT4 GPU)
  • 41. Spark Shuffle Data exchange between stages Task 1Task 0 Task 2 Task 1Task 0 Stage 1 Stage 2
  • 42. Spark Shuffle CPU-centric data movement PCI-e Bus Local Storage NetworkGPU 1 CPU GPU 0
  • 43. Accelerated Shuffle GPU-centric data movement PCI-e Bus Local Storage NetworkGPU 1 CPU GPU 0 NVLink RDMA GPU Direct Storage
  • 44. Accelerated Shuffle Shuffling Spilled Data PCI-e Bus Local Storage NetworkGPU 1 GPU 0 RDMA CPU Host Memory
  • 45. UCX Library Unified Communication X ▪ Abstracts communication transports ▪ Selects best route(s) ▪ TCP ▪ RDMA ▪ Shared Memory ▪ CUDA IPC ▪ Zero-copy GPU transfers over RDMA ▪ RDMA requires network support ▪ Infiniband ▪ RoCE ▪ http://openucx.org
  • 46. Accelerated Shuffle Results Inventory pricing query 228 45 8.4 0 50 100 150 200 250 CPU GPU GPU+UCX QueryDurationInSeconds
  • 47. Accelerated Shuffle Results ETL for logistical regression model 1556 172 79 0 400 800 1200 1600 CPU GPU GPU+UCX QueryDurationinSeconds
  • 49. What’s Next ▪ Open Source (DONE) ▪ https://github.com/NVIDIA/spark-rapids ▪ https://nvidia.github.io/spark-rapids/ ▪ Nested types ▪ Arrays ▪ Structs ▪ Maps ▪ Decimal type ▪ More operators ▪ GPU Direct Storage ▪ Time zone support for timestamps ▪ Only UTC supported now ▪ Higher order functions ▪ UDFs Further OutComing Soon
  • 50. Where to Get More Information ▪ https://NVIDIA.com/Spark ▪ Please use the “Contact Us” link to get in touch with NVIDIA’s Spark team ▪ https://github.com/NVIDIA/spark-rapids ▪ https://nvidia.github.io/spark-rapids/ ▪ Listen to Adobe’s Email Marketing Intelligent Services Use-Case ▪ Free e-book at NVIDIA.com/Spark-Book
  • 51. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.