Potential of NDP for Apache Spark

Identifying the Potential of Near
Data Processing for Apache Spark
Ahsan J. Awan (KTH), Eduard Ayguade (BSC), Mats Brorsson (KTH), Moriyoshi Ohara (IBM),
Kazuaki Ishizaki (IBM), and Vladimir Vlassov (KTH)
KTH Royal Institute of Technology, Sweden
BSC Barcelona Super Computing Center, Spain
IBM Research Tokyo, Japan
MEMSYS 2017, Oct 2-5, 2017, Washington DC, VA

Motivation / ”Big Picture”
2
Identifying The Potential Of Near Data Processing For Apache Spark, MEMSYS 2017, Washington DC, VA, USA

Previous Work / Further Reading
•  Performance characterization of in-memory data analytics on a
modern cloud server, 5th IEEE Conference on Big Data and Cloud
Computing, 2015 (Best Paper Award).
•  How Data Volume Affects Spark Based Data Analytics on a Scale-up
Server, 6th Workshop on Big Data Benchmarks, Performance
Optimization and Emerging Hardware (BpoE), held in conjunction with
VLDB 2015, Hawaii, USA.
•  Micro-architectural Characterization of Apache Spark on Batch and
Stream Processing Workloads, 6th IEEE Conference on Big Data and
Cloud Computing, 2016.
•  Node Architecture Implications for In-Memory Data Analytics in
Scale-in Clusters, 3rd IEEE/ACM Conference in Big Data Computing,
Applications and Technologies, 2016.
3

A fast and general engine for large-scale
data processing (https://spark.apache.org/).
Resilient Distributed Datasets (RDDs)
•  immutable collections of objects spread
across a cluster
Data-frames
Higher-order user-defined functions
•  Transformations (RDD à RDD)
•  Actions (RDDs à non-RDD)
Spark
4

1.  Processing-In-Memory (PIM)
2.  In-Storage Processing (ISP)
Improve the performance by
reducing costly data movements
back and forth between the
CPUs and Memories
Exploiting Near Data Processing (NDP)
5
G. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D. Zhang, and M. Ignatowski. A processing in memory taxonomy and a case
for studying fixed-function PIM. In Workshop on Near-Data Processing (WoNDP), 2013.

3D-stacked PIM for Data Analytics
for Map-Reduce
•  perform Map operations on simple processing cores in the logic
layer of 3D-stacked memory devices
for Machine Learning
•  offload atomic operations onto logic layers in 3D stacked
memories
for Graph Analytics
•  offload the graph property calculations to HMC
for SQL queries
•  Joins can benefit from 3D-stacked PIM
6

Expected benefits of NDP for big-data analytics
•  PIM for DRAM-bound applications, e.g., map-reduce,
graph- and stream-processing, ML
•  ISP for I/O-bound (non-iterative) applications, e.g. SQL
•  PIM + ISP for phasic applications, both memory- and I/O-
bound, e.g. clustering (k-means), some graph processing
7

Can Spark workloads benefit from NDP?
8

Methodology
Identifying the potential of NDP to boost the performance of Spark workloads
by matching the characteristics of the workloads to different forms of NDP
(2D integrated PIM, 3D Stacked PIM, ISP)
Representative Spark workloads (most from BigdataBench)
•  Batch, SQL, stream-, graph-processing, ML
•  should cover a diverse set of Spark transformations and actions
•  should be common among available big-data benchmark suites
•  have been used in evaluation of MR frameworks.
9

Workloads (1/2)
10

Workloads (2/2)
11

System Configuration
12Identifying The Potential Of Near Data Processing For Apache Spark, MEMSYS 2017, Washington DC, VA, USA
•  Hyper-Threading and Turbo-
boost are disabled
•  Spark in local mode: driver and
executor are in same JVM
•  HotSpot JDK version 7u71 in
server mode

•  iotop to measure the total disk bandwidth
•  top to measure %usr and %io
•  Intel Vtune Amplifier to collect hardware
performance counters
Measurement Tools and Metrics
13Identifying The Potential Of Near Data Processing For Apache Spark, MEMSYS 2017, Washington DC, VA, USA
Metrics for Top-Down Analysis of Workloads

The case for Processing-In-Memory:
2D Integrated PIM instead of 3D Stacked PIM
14
M. Radulovic, at el. Another Trip to the Wall: How Much Will Stacked DRAM Benefit
HPC? In MEMSYS ’15.

The case for In-Storage-Processing
15
Grep (Gp)
K-means (Km)Windowed Word Count (Wwc)

A refined hypothesis based on workload
characterization
•  Non-iterative Spark workloads with high ratio of I/O wait time / CPU time, e.g. join,
aggregation, filter, word count and sort, are ideal candidates for ISP.
•  Spark workloads with low ratio of I/O wait time / CPU time, e.g. stream processing and
iterative graph processing, are bound by latency of frequent accesses to DRAM and are
ideal candidates for 2D integrated PIM.
•  Iterative Spark workloads with moderate ratio of I/O wait time / CPU time, e.g., K-means,
have both I/O bound and memory bound phases and hence will benefit from hybrid 2D
integrated PIM and ISP.
•  In order to satisfy the varying compute demands of Spark workloads, we envision an
NDC architecture with programmable logic based hybrid ISP and 2D integrated PIM.
16

Contact
Ahsan Javed Awan
KTH Royal Institute of Technology, Stockholm, Sweden
Email: ajawn@kth.se
Profile: www.kth.se/profile/ajawan/
18

Potential of NDP for Apache Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Potential of NDP for Apache Spark

Similar to Potential of NDP for Apache Spark (20)

Recently uploaded

Recently uploaded (20)

Potential of NDP for Apache Spark