SlideShare a Scribd company logo
Kazuaki Ishizaki, IBM Research
Madhusudanan Kandasamy, IBM Systems
Transparent GPU Exploitation
on Apache Spark
#Res9SAIS
About Me – Madhusudanan Kandasamy
• STSM(Principal Engineer) at IBM Systems
• Working for IBM Power Systems over 15 years
– AIX & Linux OS Development
– Apache Spark Optimization for Power Systems
– Distributed ML/DL Framework with GPU & NVLink
• IBM Master Inventor (20+ Patents, 18 disclosure publications)
• Committer of GPUEnabler
– Apache Spark Plug-in to execute GPU code on Spark
• https://github.com/IBMSparkGPU/GPUEnabler
• Github: https://github.com/kmadhugit
• E-mail: madhusudanan@in.ibm.com
2Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
What You Will Learn from This Talk
• Why accelerate your workloads using GPU on Spark
– GPU/CUDA Overview
– Spark + GPU for ML workloads
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA
– Translate DataFrame program to GPU code automatically
• What are key factors to accelerate program
– Parallelism in a program
– Data format on memory
3Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
GPU/CUDA Overview
• GPGPU - Throughput
• CUDA, which is famous,
requires programmers
to explicitly write
operations for
– allocate/deallocate
device memories
– copying data between
CPU and GPU
– execute GPU kernel
4Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
// code for GPU
__global__ void GPUMultiplyBy2(
float* d_a, float* d_b, int n) {
int i = threadIdx.x;
if (n <= i) return;
d_b[i] = d_a[i] * 2.0;
}
void fooCUDA(N, float *A, float *B, int N) {
int sizeN = N * sizeof(float);
cudaMalloc(&d_A, sizeN); cudaMalloc(&d_B, sizeN);
cudaMemcpy(d_A, A, sizeN, HostToDevice);
GPUMultiplyBy2<<<N, 1>>>(d_A, d_B, N);
cudaMemcpy(B, d_B, sizeN, DeviceToHost);
cudaFree(d_B); cudaFree(d_A);
}
Spark + GPU for ML workloads
• Spark provides efficient ways to parallelize jobs across
cluster of nodes
• GPUs provide thousands of cores for efficient way to
parallelize job in a node.
• GPUs provide up to 100x processing over CPU *
• Combining Spark + GPU for lightning fast processing
– We will talk about two approaches
5Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
* https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
6Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Invoke Hand-tuned GPU program in CUDA
• GPUEnabler to simplify development
• Implemented as Spark package
– Can be drop-in into your version of Spark
• Easily Launch hand coded GPU kernels from map() or
reduce() parallel function in RDD, Dataset
• manages GPU memory, copy data between GPU and CPU,
and convert data format
• Available at https://github.com/IBMSparkGPU/GPUEnabler
7Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Example - hand tuned CUDA kernel in Spark
8Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
CUDA is a programming language
for GPU defined by NVIDIA
PTX is an assembly language file
that can be generated by a CUDA file
Step 1: Write CUDA kernels (without memory management and data copy)
Step 2: Write Spark program
Step 3: Compile and submit
Object SparkExample {
val mapFunction = new CUDAFunction("multiplyBy2", Seq(“value”), “example.ptx”)
val output = sc.parallelize(1 to 65536, 24).cache
.mapExtFunc(x => x*2, mapFunction).show }
__global__ void multiplyBy2(int *in, int *out, long size) {
long i = threadIdx.x + blockIdx.x * blockDim.x;
if (size <= i) return;
out[i] = in[i] * 2;
}
$ nvcc example.cu -ptx
$ mvn package
$ bin/spark-submit --class SparkExample SparkExample.jar
--packages com.ibm:gpu-enabler_2.11:1.0.0
Performance Improvements of GPU
program over Parallel CPU
• Achieve 3.6x for CUDA-based mini-batch logistic regression
using one P100 card over POWER8 160 SMT cores
9Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Relative execution time over GPU version
IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0,
IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Apache Spark 2.0.1, master=“local[160]”, GPU Enabler as 2017/5/1, N=112000,
features=8500, iterations=15, mini-batch size=10, parallelism(GPU)=8, parallelism(CPU)=320
Shorter is better for GPU
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
10Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
“Transparent” GPU Exploitation
• Enhanced Spark by modifying Spark source code
• Accept expression in select(), selectExpr(), and
reduce() in DataFrame
• Automatically generate CUDA code from DataFrame
program
• Automatically manage GPU memory and copy data
between GPU and CPU
• No data format conversion is required
11Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Example - “Transparent” GPU Exploitation
• Write Spark program in DataFrame
• Compile and submit them
12Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
$ mvn package
$ bin/spark-submit --class SparkExample SparkExample.jar
Object SparkExample {
val output = sc.parallelize(1 to 65536, 24).toDF(“value”).cache
.select($”value” * 2).cache.show }
Performance Improvements of Spark
DataFrame program over Parallel CPU
• Achieve 1.7x for Spark vector multiplication using one P100
card over POWER8 160 SMT cores
13Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Relative execution time over GPU version
IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0,
IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Based on Apache Spark master (id:657cb9), master=“local[160]”, N=480, vector
length=1600, parallelism(GPU)=8, parallelism(CPU)=320
Shorter is better for GPU
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
14Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
About Me – Kazuaki Ishizaki
• Researcher at IBM Research in compiler optimization
• Working for IBM Java virtual machine over 20 years
– In particular, just-in-time compiler
• Active Contributor of Spark since 2016
– 98 commits, #37 in the world (25 commits, #8 in 2018)
• Committer of GPUEnabler
• Homepage: http://ibm.biz/ishizaki
• Github: https://github.com/kiszk, Twitter: @kiszk
15Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Toward Faster GPU code
• Assign a lot of parallel computations into GPU cores
• Reduce # of memory transactions to GPU memory
16Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Assign A Lot of Parallel Computations into
GPU Cores
• Achieve high utilization of GPU
17Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Achieve high performanceAchieve low performance
Busy idle Busy
Reduce # of Memory Transactions
• Depends on memory layout, # of memory transactions are
different in a program
18Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
x1 y1 2 61 5x3 y3
Assumption: 4 consecutive data elements
can be coalesced by GPU hardware
4 v.s. 2
memory transactions to
GPU device memory
Row-oriented layout Column-oriented layout
Pt(x: Int, y: Int)
Load Pt.x1
Load Pt.x2
Load Pt.x3
Load Pt.x4
Load Pt.y1
Load Pt.y2
Load Pt.y3
Load Pt.y4
x2 y2 x4 y4 43 87
x1 x2 x3 x4
GPU
cores
Load Pt.x Load Pt.yLoad Pt.x Load Pt.y
1 231 2 4
y1 y2 y3 y4x1 x2 x3 x4 y1 y2 y3 y4
Memory
access
GPU
cores
Memory
transaction
Memory
access
Memory
transaction
Memory
access
Toward Faster GPU Code
• Assign a lot of parallel computations into GPU cores
– Spark program has been already written by a set of parallel
operations
• e.g. map, join, …
• Reduce # of memory transactions
– Column-oriented layout achieves better performance
• The paper* reports 3.3x performance improvement of GPU kernel
execution of kmeans over row-oriented layout
19Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
* Che, Shuai and Sheaffer, Jeremy W. and Skadron, Kevin.
“Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems”, SC’11
Questions
• How can we write a parallel program for GPU on Spark?
• How can we use column-oriented storage for GPU in
Spark?
20Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Questions
• How can we write a parallel program for GPU on Spark?
– Thanks to Spark programming model!!
• How can we use column-oriented storage for GPU in
Spark?
21Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Apache Spark
– Invoke hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
22Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Data Movement with GPUEnabler
• Data in RDD is moved into off-heap as column-oriented
storage
23Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy
map
Data copy
Format
conversion
Format
conversion
RDD1 RDD3val rdd1.cache
val rdd2=rdd1.mapExtFunc(…)
val rdd3=rdd2.mapExtFunc(…)
Row-
oriented
storage
Column-
oriented
storage
Row-
oriented
storage
map
Row-
oriented
storage
Row-
oriented
storage
Column-
oriented
storage
Data Movement with GPUEnabler
• Data in RDD is moved into off-heap as column-oriented
storage
24Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy
map
Data copy
Format
conversion
Format
conversion
RDD1 RDD3val rdd1.cache
val rdd2=rdd1.mapExtFunc(…)
val rdd3=rdd2.mapExtFunc(…)
Row-
oriented
storage
Column-
oriented
storage
Column-
oriented
storage
map
Column-
oriented
storage
Row-
oriented
storage
Column-
oriented
storage
Conversions and copies
are automatically done
How To Write GPU Code in GPUEnabler
• Write a GPU kernel corresponds to map()
25Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
val rdd2 = rdd1
.mapExtFunc(p => Point(p.x*2, p.y*2), mapFunction)
__global__ void multiplyBy2(int *inx, int *iny,
int *outx, int *outy, long size) {
long i = threadIdx.x + blockIdx.x * blockDim.x;
if (size <= i) return;
outx[i] = inx[i] * 2; outy[i] = iny[i] * 2;
}
GPU Kernel
Spark Program
Data Movement with DataFrame
• Cache in DataFrame already uses column-oriented storage
• We enhanced to create cache for DataFrame in off-heap
– Reduce conversion and data copy between Java heap and Off-heap
26Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy Data copy
DataFrame’s cache DataFrame’s cache
val df1.cache
val df2=df1.map(…)
val df3=df2.map(…).cache
Column-
oriented
storage
Column-
oriented
storage
map + map
Column-
oriented
storage
Column-
oriented
storage
Data Movement with DataFrame
• Cache in DataFrame already uses column-oriented storage
• We enhanced to create cache for DataFrame in off-heap
– Reduce conversion and data copy between Java heap and Off-heap
27Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy Data copy
DataFrame’s cache DataFrame’s cache
val df1.cache
val df2=df1.map(…)
val df3=df2.map(…).cache
Column-
oriented
storage
Column-
oriented
storage
map + map
Column-
oriented
storage
Column-
oriented
storage
copy is
automatically done
How To Generate GPU Code from
DataFrame
• Added a new path to generate CUDA code into Catalyst
28Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Derived Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust
DataFrame
program
Unresolved
logical plan
Logical
Plan
Optimized
Logical Plan
Physical
Plan
Java
CUDA
for kernel
Analysis Optimizations Planning
Code
generation
Newly
added
Catalyst
Takeaway
• Why accelerate your workloads using GPU on Spark
– Achieved up to 3.6x over 160-CPU-thread parallel execution
• How to use GPUs on Spark
– Invoke Hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• How two approaches execute program on GPUs
– Address easy programming for many non-experts, not the
state-of-the-art performance by small numbers of top-notch
programmers
29Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS

More Related Content

What's hot

Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and Present
Databricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
Databricks
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
Adarsh Pannu
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Evan Chan
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
Omid Vahdaty
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Daniel Rodriguez
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 

What's hot (20)

Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and Present
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 

Similar to Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhusudanan Kandasamy

Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Databricks
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
Databricks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
Maksud Ibrahimov
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
PAPIs.io
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
Ganesan Narayanasamy
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
Adam Roberts
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Ahsan Javed Awan
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Spark Summit
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Databricks
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Databricks
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
NVIDIA
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 

Similar to Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhusudanan Kandasamy (20)

Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 

Recently uploaded (20)

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 

Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhusudanan Kandasamy

  • 1. Kazuaki Ishizaki, IBM Research Madhusudanan Kandasamy, IBM Systems Transparent GPU Exploitation on Apache Spark #Res9SAIS
  • 2. About Me – Madhusudanan Kandasamy • STSM(Principal Engineer) at IBM Systems • Working for IBM Power Systems over 15 years – AIX & Linux OS Development – Apache Spark Optimization for Power Systems – Distributed ML/DL Framework with GPU & NVLink • IBM Master Inventor (20+ Patents, 18 disclosure publications) • Committer of GPUEnabler – Apache Spark Plug-in to execute GPU code on Spark • https://github.com/IBMSparkGPU/GPUEnabler • Github: https://github.com/kmadhugit • E-mail: madhusudanan@in.ibm.com 2Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 3. What You Will Learn from This Talk • Why accelerate your workloads using GPU on Spark – GPU/CUDA Overview – Spark + GPU for ML workloads • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA – Translate DataFrame program to GPU code automatically • What are key factors to accelerate program – Parallelism in a program – Data format on memory 3Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 4. GPU/CUDA Overview • GPGPU - Throughput • CUDA, which is famous, requires programmers to explicitly write operations for – allocate/deallocate device memories – copying data between CPU and GPU – execute GPU kernel 4Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS // code for GPU __global__ void GPUMultiplyBy2( float* d_a, float* d_b, int n) { int i = threadIdx.x; if (n <= i) return; d_b[i] = d_a[i] * 2.0; } void fooCUDA(N, float *A, float *B, int N) { int sizeN = N * sizeof(float); cudaMalloc(&d_A, sizeN); cudaMalloc(&d_B, sizeN); cudaMemcpy(d_A, A, sizeN, HostToDevice); GPUMultiplyBy2<<<N, 1>>>(d_A, d_B, N); cudaMemcpy(B, d_B, sizeN, DeviceToHost); cudaFree(d_B); cudaFree(d_A); }
  • 5. Spark + GPU for ML workloads • Spark provides efficient ways to parallelize jobs across cluster of nodes • GPUs provide thousands of cores for efficient way to parallelize job in a node. • GPUs provide up to 100x processing over CPU * • Combining Spark + GPU for lightning fast processing – We will talk about two approaches 5Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS * https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
  • 6. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 6Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 7. Invoke Hand-tuned GPU program in CUDA • GPUEnabler to simplify development • Implemented as Spark package – Can be drop-in into your version of Spark • Easily Launch hand coded GPU kernels from map() or reduce() parallel function in RDD, Dataset • manages GPU memory, copy data between GPU and CPU, and convert data format • Available at https://github.com/IBMSparkGPU/GPUEnabler 7Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 8. Example - hand tuned CUDA kernel in Spark 8Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS CUDA is a programming language for GPU defined by NVIDIA PTX is an assembly language file that can be generated by a CUDA file Step 1: Write CUDA kernels (without memory management and data copy) Step 2: Write Spark program Step 3: Compile and submit Object SparkExample { val mapFunction = new CUDAFunction("multiplyBy2", Seq(“value”), “example.ptx”) val output = sc.parallelize(1 to 65536, 24).cache .mapExtFunc(x => x*2, mapFunction).show } __global__ void multiplyBy2(int *in, int *out, long size) { long i = threadIdx.x + blockIdx.x * blockDim.x; if (size <= i) return; out[i] = in[i] * 2; } $ nvcc example.cu -ptx $ mvn package $ bin/spark-submit --class SparkExample SparkExample.jar --packages com.ibm:gpu-enabler_2.11:1.0.0
  • 9. Performance Improvements of GPU program over Parallel CPU • Achieve 3.6x for CUDA-based mini-batch logistic regression using one P100 card over POWER8 160 SMT cores 9Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Relative execution time over GPU version IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0, IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Apache Spark 2.0.1, master=“local[160]”, GPU Enabler as 2017/5/1, N=112000, features=8500, iterations=15, mini-batch size=10, parallelism(GPU)=8, parallelism(CPU)=320 Shorter is better for GPU
  • 10. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 10Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 11. “Transparent” GPU Exploitation • Enhanced Spark by modifying Spark source code • Accept expression in select(), selectExpr(), and reduce() in DataFrame • Automatically generate CUDA code from DataFrame program • Automatically manage GPU memory and copy data between GPU and CPU • No data format conversion is required 11Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 12. Example - “Transparent” GPU Exploitation • Write Spark program in DataFrame • Compile and submit them 12Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS $ mvn package $ bin/spark-submit --class SparkExample SparkExample.jar Object SparkExample { val output = sc.parallelize(1 to 65536, 24).toDF(“value”).cache .select($”value” * 2).cache.show }
  • 13. Performance Improvements of Spark DataFrame program over Parallel CPU • Achieve 1.7x for Spark vector multiplication using one P100 card over POWER8 160 SMT cores 13Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Relative execution time over GPU version IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0, IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Based on Apache Spark master (id:657cb9), master=“local[160]”, N=480, vector length=1600, parallelism(GPU)=8, parallelism(CPU)=320 Shorter is better for GPU
  • 14. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 14Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 15. About Me – Kazuaki Ishizaki • Researcher at IBM Research in compiler optimization • Working for IBM Java virtual machine over 20 years – In particular, just-in-time compiler • Active Contributor of Spark since 2016 – 98 commits, #37 in the world (25 commits, #8 in 2018) • Committer of GPUEnabler • Homepage: http://ibm.biz/ishizaki • Github: https://github.com/kiszk, Twitter: @kiszk 15Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 16. Toward Faster GPU code • Assign a lot of parallel computations into GPU cores • Reduce # of memory transactions to GPU memory 16Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 17. Assign A Lot of Parallel Computations into GPU Cores • Achieve high utilization of GPU 17Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Achieve high performanceAchieve low performance Busy idle Busy
  • 18. Reduce # of Memory Transactions • Depends on memory layout, # of memory transactions are different in a program 18Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS x1 y1 2 61 5x3 y3 Assumption: 4 consecutive data elements can be coalesced by GPU hardware 4 v.s. 2 memory transactions to GPU device memory Row-oriented layout Column-oriented layout Pt(x: Int, y: Int) Load Pt.x1 Load Pt.x2 Load Pt.x3 Load Pt.x4 Load Pt.y1 Load Pt.y2 Load Pt.y3 Load Pt.y4 x2 y2 x4 y4 43 87 x1 x2 x3 x4 GPU cores Load Pt.x Load Pt.yLoad Pt.x Load Pt.y 1 231 2 4 y1 y2 y3 y4x1 x2 x3 x4 y1 y2 y3 y4 Memory access GPU cores Memory transaction Memory access Memory transaction Memory access
  • 19. Toward Faster GPU Code • Assign a lot of parallel computations into GPU cores – Spark program has been already written by a set of parallel operations • e.g. map, join, … • Reduce # of memory transactions – Column-oriented layout achieves better performance • The paper* reports 3.3x performance improvement of GPU kernel execution of kmeans over row-oriented layout 19Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS * Che, Shuai and Sheaffer, Jeremy W. and Skadron, Kevin. “Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems”, SC’11
  • 20. Questions • How can we write a parallel program for GPU on Spark? • How can we use column-oriented storage for GPU in Spark? 20Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 21. Questions • How can we write a parallel program for GPU on Spark? – Thanks to Spark programming model!! • How can we use column-oriented storage for GPU in Spark? 21Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 22. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Apache Spark – Invoke hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 22Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 23. Data Movement with GPUEnabler • Data in RDD is moved into off-heap as column-oriented storage 23Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy map Data copy Format conversion Format conversion RDD1 RDD3val rdd1.cache val rdd2=rdd1.mapExtFunc(…) val rdd3=rdd2.mapExtFunc(…) Row- oriented storage Column- oriented storage Row- oriented storage map Row- oriented storage Row- oriented storage Column- oriented storage
  • 24. Data Movement with GPUEnabler • Data in RDD is moved into off-heap as column-oriented storage 24Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy map Data copy Format conversion Format conversion RDD1 RDD3val rdd1.cache val rdd2=rdd1.mapExtFunc(…) val rdd3=rdd2.mapExtFunc(…) Row- oriented storage Column- oriented storage Column- oriented storage map Column- oriented storage Row- oriented storage Column- oriented storage Conversions and copies are automatically done
  • 25. How To Write GPU Code in GPUEnabler • Write a GPU kernel corresponds to map() 25Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS val rdd2 = rdd1 .mapExtFunc(p => Point(p.x*2, p.y*2), mapFunction) __global__ void multiplyBy2(int *inx, int *iny, int *outx, int *outy, long size) { long i = threadIdx.x + blockIdx.x * blockDim.x; if (size <= i) return; outx[i] = inx[i] * 2; outy[i] = iny[i] * 2; } GPU Kernel Spark Program
  • 26. Data Movement with DataFrame • Cache in DataFrame already uses column-oriented storage • We enhanced to create cache for DataFrame in off-heap – Reduce conversion and data copy between Java heap and Off-heap 26Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy Data copy DataFrame’s cache DataFrame’s cache val df1.cache val df2=df1.map(…) val df3=df2.map(…).cache Column- oriented storage Column- oriented storage map + map Column- oriented storage Column- oriented storage
  • 27. Data Movement with DataFrame • Cache in DataFrame already uses column-oriented storage • We enhanced to create cache for DataFrame in off-heap – Reduce conversion and data copy between Java heap and Off-heap 27Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy Data copy DataFrame’s cache DataFrame’s cache val df1.cache val df2=df1.map(…) val df3=df2.map(…).cache Column- oriented storage Column- oriented storage map + map Column- oriented storage Column- oriented storage copy is automatically done
  • 28. How To Generate GPU Code from DataFrame • Added a new path to generate CUDA code into Catalyst 28Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Derived Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust DataFrame program Unresolved logical plan Logical Plan Optimized Logical Plan Physical Plan Java CUDA for kernel Analysis Optimizations Planning Code generation Newly added Catalyst
  • 29. Takeaway • Why accelerate your workloads using GPU on Spark – Achieved up to 3.6x over 160-CPU-thread parallel execution • How to use GPUs on Spark – Invoke Hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • How two approaches execute program on GPUs – Address easy programming for many non-experts, not the state-of-the-art performance by small numbers of top-notch programmers 29Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS