SlideShare a Scribd company logo
1 of 1
Download to read offline
Benchmarking GPU-based Acceleration of Spark in ML Workload
using Openpower
Vimalkumar Kumaresan, R.Baskaran
Introduction
•In recent decades, there is a huge demand in large
scale data processing (aka Bigdata).
•Computing needs are addressed by both horizontal
and vertical scalability.
•Vertical Scaling have limitation over the number pro-
cessing units.
•Horizontal scaling have increased the maintenance
cost and energy consumption.
•Distributed data processing frameworks like Apache
Hadoop and spark[1] have been proposed.
Sponc
f
r-
%
Figure 1: GPU-Aware Spark
Motivation
•The proposed bigdata frameworks do not effectively
utilize the internal resources such as GPU and other
hardware enhancements.
•Need for a hybrid approach which can leverage best
from both worlds.
•Data communication become bottleneck in conven-
tional GPU-based processing.
•GPU-Aware Spark project[2] was proposed which
can optimize the internal data communication be-
tween CPU and GPU.
•In this work, the preliminary benchmark experi-
ments are conducted on GPU-Enabled Spark by run-
ning Machine Learning Workload and evaluated its
benefits.
Architecture and Methods
•The Architecture of the GPU-Aware Spark[2] (Refer
Figure 2).
•Two newly designed components are proposed - bi-
nary columnar and GPUEnabler.
•The binary columnar RDD - is a column-oriented
layout, binary representation and on off-heap. It can
simply copying data to GPU device memory.
•The GPUEnabler - mainly used to launch the GPU
kernels.
Figure 2: GPU-Aware Spark Architecture
Results
•The experiments have been conducted in IBM
Power8 S8247 22L system[3].
•ML Workload - Used naive implementation of Logis-
tic Regression using Stochastic Gradient Descent)
•Parameters for Evaluation are, N : Number of Data
points, D : Number of Dimensions, I : Iteration and,
NSlice : Number of Slices.
0 100 200 300 400 500
34567
Num. of Iterations
ElapsedTimeLog10(msecs)
CPU
GPU
Figure 3: D: 200 Attributes with No.slices: 3 (N = 100000)
•The GPU performance increases linearly over I
which is approx 0.5x at each iterations (Figure 3).
•When D increases, the performance of GPU increases
by 10X over CPU (refer Figure 4).
0 100 200 300 400 500
34567
Num. of Iterations
ElapsedTimeLog10(msecs)
CPU
GPU
Figure 4: D: 200 Attributes with No. Slices: 2 (N =
100000)
0 100 200 300 400 500
34567
Num. of Iterations
ElapsedTimeLog10(msecs)
CPU
GPU
Figure 5: 1 Million Datapoints with 200 attributes (No.
Slices: 3)
Similarly, when N increases the performance of GPU
increases by 10X over CPU (refer Figure 5).
Tables
This section shows the experimental results as Table 1,
which is plotted as graph in Figure 3, 4 & 5.
numSlices=3; D=200 numSlices=2; D=400
Elapsed Time (msecs) CPU GPU CPU GPU
N = 100000
Iter#10 21454 3658
Iter#50 36855 8189 118990 11609
Iter#100 75486 14148 188540 20181
Iter#200 169990 27878 375763 36705
Iter#500 430510 62951 1061890 90000
N = 1000000
Iter#10 74565 10565
Iter#50 332925 43000
Iter#100 837271 80910
Iter#200 1509462 159083
Iter#500 5030544 378983
Table 1: Experimental Results Summary
Conclusions
In this poster, we conducted the preliminary bench-
mark experiments using GPU-Aware Spark. Based on
the results, The GPU performance increases over the
number of data points (N), Features (D), and No. of
Iteration (I), which shows 10x, 10x and 0.5x (per iter-
ations) faster than CPU respectively.
In Future, we are planning to run more ML benchmark
experiments and validate the proposed GPU-Aware
framework.
Acknowledgments
Special Thanks to IBM for sponsoring our research.
Thanks to Ganesan Narayanasamy, Imran Basha for
assisting us in performing benchmark experiments
and Madhusudanan Kandasamy (from IBM) for pro-
viding valid feedbacks.
References
1.Apache Spark,”Apache Spark”, http://spark.apache.org/.
2.Kazuaki Ishizaki eet. al,”Exploiting GPU in Spark”,
”2016, Spark Conference”, Japan, 2016.
3.IBM, ”Power8 S8247 22L system”, https://www.ibm.com/sup
22L/p8hdx/824722lpd ff iles.htm

More Related Content

What's hot

Landset 8 的雲層去除技巧實作
Landset 8 的雲層去除技巧實作Landset 8 的雲層去除技巧實作
Landset 8 的雲層去除技巧實作鈵斯 倪
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFCommunicatieSURF
 
Azure Stream Analytics Project : On-demand real-time analytics
Azure Stream Analytics Project : On-demand real-time analyticsAzure Stream Analytics Project : On-demand real-time analytics
Azure Stream Analytics Project : On-demand real-time analyticsLamprini Koutsokera
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationPaolo Missier
 
Enhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUEnhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUMahesh Khadatare
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Karel Dumon
 
Parallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based ModelingParallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based ModelingJason Liu
 
Hadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterHadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterairbots
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performanceShenglin Du
 
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-indexGpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-indexMahesh Khadatare
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Sean Moran
 
Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...
Access to Open Earth Observation Data, an Overview and Outlook  Raymond Sluit...Access to Open Earth Observation Data, an Overview and Outlook  Raymond Sluit...
Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...CommunicatieSURF
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampKarel Dumon
 
Os revision ques
Os revision quesOs revision ques
Os revision quesJohn Jo
 

What's hot (20)

Landset 8 的雲層去除技巧實作
Landset 8 的雲層去除技巧實作Landset 8 的雲層去除技巧實作
Landset 8 的雲層去除技巧實作
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
TPU paper slide
TPU paper slideTPU paper slide
TPU paper slide
 
Working together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURFWorking together with SURF Raymond Oonk Annette Langedijk SURF
Working together with SURF Raymond Oonk Annette Langedijk SURF
 
Azure Stream Analytics Project : On-demand real-time analytics
Azure Stream Analytics Project : On-demand real-time analyticsAzure Stream Analytics Project : On-demand real-time analytics
Azure Stream Analytics Project : On-demand real-time analytics
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-Computation
 
Enhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUEnhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPU
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)
 
Parallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based ModelingParallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based Modeling
 
Hadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterHadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm cluster
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-indexGpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
Gpu based-image-quality-assessment-using-structural-similarity-(ssim)-index
 
20131212
2013121220131212
20131212
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
 
Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...
Access to Open Earth Observation Data, an Overview and Outlook  Raymond Sluit...Access to Open Earth Observation Data, an Overview and Outlook  Raymond Sluit...
Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...
 
poster_revised
poster_revisedposter_revised
poster_revised
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks Bootcamp
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Os revision ques
Os revision quesOs revision ques
Os revision ques
 

Similar to IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload using Openpower.2016

High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modelingnadikari123
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxBharathiLakshmiAAssi
 
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET Journal
 
Review on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitReview on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitIJERA Editor
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Editor IJARCET
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Editor IJARCET
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_ENKohei KaiGai
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 

Similar to IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload using Openpower.2016 (20)

High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
Se notes
Se notesSe notes
Se notes
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
 
matrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsxmatrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsx
 
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
 
Review on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitReview on Multiply-Accumulate Unit
Review on Multiply-Accumulate Unit
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 

IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload using Openpower.2016

  • 1. Benchmarking GPU-based Acceleration of Spark in ML Workload using Openpower Vimalkumar Kumaresan, R.Baskaran Introduction •In recent decades, there is a huge demand in large scale data processing (aka Bigdata). •Computing needs are addressed by both horizontal and vertical scalability. •Vertical Scaling have limitation over the number pro- cessing units. •Horizontal scaling have increased the maintenance cost and energy consumption. •Distributed data processing frameworks like Apache Hadoop and spark[1] have been proposed. Sponc f r- % Figure 1: GPU-Aware Spark Motivation •The proposed bigdata frameworks do not effectively utilize the internal resources such as GPU and other hardware enhancements. •Need for a hybrid approach which can leverage best from both worlds. •Data communication become bottleneck in conven- tional GPU-based processing. •GPU-Aware Spark project[2] was proposed which can optimize the internal data communication be- tween CPU and GPU. •In this work, the preliminary benchmark experi- ments are conducted on GPU-Enabled Spark by run- ning Machine Learning Workload and evaluated its benefits. Architecture and Methods •The Architecture of the GPU-Aware Spark[2] (Refer Figure 2). •Two newly designed components are proposed - bi- nary columnar and GPUEnabler. •The binary columnar RDD - is a column-oriented layout, binary representation and on off-heap. It can simply copying data to GPU device memory. •The GPUEnabler - mainly used to launch the GPU kernels. Figure 2: GPU-Aware Spark Architecture Results •The experiments have been conducted in IBM Power8 S8247 22L system[3]. •ML Workload - Used naive implementation of Logis- tic Regression using Stochastic Gradient Descent) •Parameters for Evaluation are, N : Number of Data points, D : Number of Dimensions, I : Iteration and, NSlice : Number of Slices. 0 100 200 300 400 500 34567 Num. of Iterations ElapsedTimeLog10(msecs) CPU GPU Figure 3: D: 200 Attributes with No.slices: 3 (N = 100000) •The GPU performance increases linearly over I which is approx 0.5x at each iterations (Figure 3). •When D increases, the performance of GPU increases by 10X over CPU (refer Figure 4). 0 100 200 300 400 500 34567 Num. of Iterations ElapsedTimeLog10(msecs) CPU GPU Figure 4: D: 200 Attributes with No. Slices: 2 (N = 100000) 0 100 200 300 400 500 34567 Num. of Iterations ElapsedTimeLog10(msecs) CPU GPU Figure 5: 1 Million Datapoints with 200 attributes (No. Slices: 3) Similarly, when N increases the performance of GPU increases by 10X over CPU (refer Figure 5). Tables This section shows the experimental results as Table 1, which is plotted as graph in Figure 3, 4 & 5. numSlices=3; D=200 numSlices=2; D=400 Elapsed Time (msecs) CPU GPU CPU GPU N = 100000 Iter#10 21454 3658 Iter#50 36855 8189 118990 11609 Iter#100 75486 14148 188540 20181 Iter#200 169990 27878 375763 36705 Iter#500 430510 62951 1061890 90000 N = 1000000 Iter#10 74565 10565 Iter#50 332925 43000 Iter#100 837271 80910 Iter#200 1509462 159083 Iter#500 5030544 378983 Table 1: Experimental Results Summary Conclusions In this poster, we conducted the preliminary bench- mark experiments using GPU-Aware Spark. Based on the results, The GPU performance increases over the number of data points (N), Features (D), and No. of Iteration (I), which shows 10x, 10x and 0.5x (per iter- ations) faster than CPU respectively. In Future, we are planning to run more ML benchmark experiments and validate the proposed GPU-Aware framework. Acknowledgments Special Thanks to IBM for sponsoring our research. Thanks to Ganesan Narayanasamy, Imran Basha for assisting us in performing benchmark experiments and Madhusudanan Kandasamy (from IBM) for pro- viding valid feedbacks. References 1.Apache Spark,”Apache Spark”, http://spark.apache.org/. 2.Kazuaki Ishizaki eet. al,”Exploiting GPU in Spark”, ”2016, Spark Conference”, Japan, 2016. 3.IBM, ”Power8 S8247 22L system”, https://www.ibm.com/sup 22L/p8hdx/824722lpd ff iles.htm