DUG'20: 02 - Accelerating apache spark with DAOS on Aurora

Accelerating Apache Spark with DAOS
on Aurora
Carson Wang
Nov. 2020

2Intel Architecture, Graphics, and SoftwareIAGS
Agenda
▪ Apache Spark Overview
▪ DAOS Hadoop Filesystem
▪ DAOS Spark Shuffle Manager

Apache Spark: A Unified Analytics Engine
• A fast and general-purpose cluster computing
system, supports a rich set of high level tools
including Spark SQL for SQL processing,
MLLib for machine learning, GraphX for graph
processing and Spark Streaming.
• Provides high level APIs in Java, Scala, Python
and R.
• Supports multiple resource manager(Apache
Yarn, Apache Mesos, Kubernetes, standalone)
• Supports diverse data sources including
HDFS, Apache Cassandra, Apache Hbase, etc.

Enable and Accelerate Spark with DAOS
• Spark Input/Output Storage: From HDFS to DAOS
• Spark Shuffle Data Storage: From Local FS to DAOS
Compute & Storage Cluster
Node 1
Spark
Local FS HDFS
Input/Output
Data
Shuffle
Data
Compute Cluster
Node 1
Spark
Storage Cluster
DAOS
Shuffle
Data
Input/Output
Data
Colocated Cluster Disaggregated Cluster

DAOS Hadoop Filesystem Interface
S3A FileSystem
DAOS Hadoop
FileSystem
AWS Java SDK
DAOS API
Java Wrapper
HDFS
Hadoop FileSystem Abstract Class
Local
Disks Amazon S3 DAOS
Storage
DAOS Library
Spark Applications

DAOS Hadoop Filesystem Throughput
0
1000
2000
3000
4000
5000
6000
7000
0 5 10 15 20 25 30 35 40 45 50
MB/s
Client Number
Read Throughput
IOR DFS DFSIO
• DAOS Hadoop Filesystem delivers the same read/write
throughput as DAOS DFS API with enough parallelism.
0
500
1000
1500
2000
2500
0 10 20 30 40 50
MB/s
Client Number
Write Throughput
IOR DFS DFSIO
2 DAOS Servers, each with 1 P4500 NVMe; 3 Client Nodes; Network: 100 Gb Omni-Path

▪ Add the DAOS Hadoop FS jar file to the classpath of the Spark
executor and driver
Using DAOS Hadoop Filesystem in Spark
spark.executor.extraClassPath /path/to/DAOS Hadoop FS jar
spark.driver.extraClassPath /path/to/DAOS Hadoop FS jar
df = spark.read.json("daos:///UNS/path/root/dir/file")
▪ For DAOS UNS path, read data using the following URI in Spark
▪ For non-UNS path, add pool and container UUID in configuration
file daos-site.xml and use the following URI in Spark.
df = spark.read.json("daos:///root/dir/file")

Spark Shuffle Overview

DAOS Hadoop FS based Shuffle
Index File
DAOS
Index File
Reduce Task 1
Reduce Task 2
Reduce Task 3
Data File
Data File
Write Read
Posix Container
1. Map task writes the shuffle
output file to DAOS using
DAOS Hadoop filesystem API.
The shuffle data file is sorted
by partition ID.
2. Reduce task reads the shuffle
partitions from every shuffle
data file.
3. Every reduce task needs to
open all shuffle data files. File
handles can only be cached at
Spark executor level.
Shuffle Block
Map Task 1
Map Task 2

DAOS Hadoop FS based Shuffle Performance
Challenges:
Frequent file lookup/open
overhead during Shuffle Read,
especially when shuffle block
size is small.

DAOS Object based Spark Shuffle
DAOS Object
Reduce Task 1
Reduce Task 2
Reduce Task 3
Dkey: ReduceID 1
akey: MapID 1
akey: MapID 2
Dkey: ReduceID 2
akey: MapID 1
akey: MapID 2
Dkey: ReduceID 3
akey: MapID 1
akey: MapID 2
Map Task 1
Map Task 2
Write Read
1. Map task divides the data into
R partitions where R equals to
the number of reducers, and
every small block will be
stored into a DAOS object with
a specific Dkey and akey
where Dkey equals to
ReduceID and akey equals to
Map attempt ID.
2. For every reduce task, its
shuffle data will be stored
under the same Dkey. During
shuffle read, every reduce task
just reads all successful akeys’
value under the Dkey.

DAOS Object based Spark Shuffle Performance
5.83 5.68
5.68 5.68
1.94 1.94 2.06 2.06
0
1
2
3
4
5
6
7
200K 400K 800K 1600K
Throughput(GB/s)
Shuffle Block Size
Shuffle Throughput: 200 GB Repartition
Shuffle Read Shuffle Write
• Shuffle Read Throughput has
been improved, 1.5x for
200KB shuffle block.
• Shuffle Write Throughput
doesn’t change as it already
hits the NVMe limit.

▪ Add the remote shuffle jar file to the classpath of the Spark
executor and driver
▪ Enable DAOS remote shuffle
Using DAOS Object based Shuffle in Spark
spark.executor.extraClassPath /path/to/DAOS remote shuffle jar
spark.driver.extraClassPath /path/to/DAOS remote shuffle jar
spark.shuffle.manager org.apache.spark.shuffle.daos.DaosShuffleManager
spark.shuffle.daos.pool.uuid <Pool UUID>
spark.shuffle.daos.container.uuid <Container UUID>

▪ Performance:
• Switch to DAOS async API in the DAOS Spark shuffle manager.
▪ User Experience:
• Simplify user configuration and set better default values.
• Test and benchmark to cover more use cases.
Future Plan

DUG'20: 02 - Accelerating apache spark with DAOS on Aurora

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to DUG'20: 02 - Accelerating apache spark with DAOS on Aurora

Similar to DUG'20: 02 - Accelerating apache spark with DAOS on Aurora (20)

More from Andrey Kudryavtsev

More from Andrey Kudryavtsev (13)

Recently uploaded

Recently uploaded (20)

DUG'20: 02 - Accelerating apache spark with DAOS on Aurora