IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload using Openpower.2016
1. Benchmarking GPU-based Acceleration of Spark in ML Workload
using Openpower
Vimalkumar Kumaresan, R.Baskaran
Introduction
•In recent decades, there is a huge demand in large
scale data processing (aka Bigdata).
•Computing needs are addressed by both horizontal
and vertical scalability.
•Vertical Scaling have limitation over the number pro-
cessing units.
•Horizontal scaling have increased the maintenance
cost and energy consumption.
•Distributed data processing frameworks like Apache
Hadoop and spark[1] have been proposed.
Sponc
f
r-
%
Figure 1: GPU-Aware Spark
Motivation
•The proposed bigdata frameworks do not effectively
utilize the internal resources such as GPU and other
hardware enhancements.
•Need for a hybrid approach which can leverage best
from both worlds.
•Data communication become bottleneck in conven-
tional GPU-based processing.
•GPU-Aware Spark project[2] was proposed which
can optimize the internal data communication be-
tween CPU and GPU.
•In this work, the preliminary benchmark experi-
ments are conducted on GPU-Enabled Spark by run-
ning Machine Learning Workload and evaluated its
benefits.
Architecture and Methods
•The Architecture of the GPU-Aware Spark[2] (Refer
Figure 2).
•Two newly designed components are proposed - bi-
nary columnar and GPUEnabler.
•The binary columnar RDD - is a column-oriented
layout, binary representation and on off-heap. It can
simply copying data to GPU device memory.
•The GPUEnabler - mainly used to launch the GPU
kernels.
Figure 2: GPU-Aware Spark Architecture
Results
•The experiments have been conducted in IBM
Power8 S8247 22L system[3].
•ML Workload - Used naive implementation of Logis-
tic Regression using Stochastic Gradient Descent)
•Parameters for Evaluation are, N : Number of Data
points, D : Number of Dimensions, I : Iteration and,
NSlice : Number of Slices.
0 100 200 300 400 500
34567
Num. of Iterations
ElapsedTimeLog10(msecs)
CPU
GPU
Figure 3: D: 200 Attributes with No.slices: 3 (N = 100000)
•The GPU performance increases linearly over I
which is approx 0.5x at each iterations (Figure 3).
•When D increases, the performance of GPU increases
by 10X over CPU (refer Figure 4).
0 100 200 300 400 500
34567
Num. of Iterations
ElapsedTimeLog10(msecs)
CPU
GPU
Figure 4: D: 200 Attributes with No. Slices: 2 (N =
100000)
0 100 200 300 400 500
34567
Num. of Iterations
ElapsedTimeLog10(msecs)
CPU
GPU
Figure 5: 1 Million Datapoints with 200 attributes (No.
Slices: 3)
Similarly, when N increases the performance of GPU
increases by 10X over CPU (refer Figure 5).
Tables
This section shows the experimental results as Table 1,
which is plotted as graph in Figure 3, 4 & 5.
numSlices=3; D=200 numSlices=2; D=400
Elapsed Time (msecs) CPU GPU CPU GPU
N = 100000
Iter#10 21454 3658
Iter#50 36855 8189 118990 11609
Iter#100 75486 14148 188540 20181
Iter#200 169990 27878 375763 36705
Iter#500 430510 62951 1061890 90000
N = 1000000
Iter#10 74565 10565
Iter#50 332925 43000
Iter#100 837271 80910
Iter#200 1509462 159083
Iter#500 5030544 378983
Table 1: Experimental Results Summary
Conclusions
In this poster, we conducted the preliminary bench-
mark experiments using GPU-Aware Spark. Based on
the results, The GPU performance increases over the
number of data points (N), Features (D), and No. of
Iteration (I), which shows 10x, 10x and 0.5x (per iter-
ations) faster than CPU respectively.
In Future, we are planning to run more ML benchmark
experiments and validate the proposed GPU-Aware
framework.
Acknowledgments
Special Thanks to IBM for sponsoring our research.
Thanks to Ganesan Narayanasamy, Imran Basha for
assisting us in performing benchmark experiments
and Madhusudanan Kandasamy (from IBM) for pro-
viding valid feedbacks.
References
1.Apache Spark,”Apache Spark”, http://spark.apache.org/.
2.Kazuaki Ishizaki eet. al,”Exploiting GPU in Spark”,
”2016, Spark Conference”, Japan, 2016.
3.IBM, ”Power8 S8247 22L system”, https://www.ibm.com/sup
22L/p8hdx/824722lpd ff iles.htm