SlideShare a Scribd company logo
Patrick Stuedi, IBM Research
Running Spark on a High-
Performance Cluster
using RDMA Networking
and NVMe Flash
Hardware Trends
community
target
20172010
1Gbps
50us
10Gbps
20us
100 MB/s
100ms
1000 MB/s
200us
Hardware Trends
2017
10 GB/s
50us
100Gbps
1us
community
target
our
target
20172010
1Gbps
50us
10Gbps
20us
100 MB/s
100ms
1000 MB/s
200us
User-Level APIs
Storage App
HDD
FileSystem
iSCSI
Block Layer
OS Kernel
Network App
NIC
Sockets
Ethernet
TCP/IP
OS Kernel
Storage App
SSD, NIC
SPDK
Network App
NIC
OS KernelRDMA, DPDK
Kernel
Kernel
Traditional / Kernel-based User-Level / Kernel-Bypass
Needed to
achieve
1us RTT!
Remote Data Access
512 B 4 KB 64 KB 1 MB
0
200
400
600
800
1000
1200
NVMe over Fabrics
iSCSI
data size
time[usec]
DRAM NVMe Flash
512 B 4 KB 64 KB 1 MB
0
50
100
150
200
250
RDMA
Sockets
data size
time[usec]
User-level APIs offer 2-10x better performance than
traditional kernel-based APIs
Let's Use it!
Spark
NVMe Flash
DRAM
High-Performance RDMA network
SQL Streaming MLlib GraphX
Performance!
Realtime!
Disaggregation!
Case Study: Sorting in Spark
MapCores ReduceInput Output
read & classify shuffle store
Net Ser
Net Ser
Net Ser
Sort
Sort
Sort
Experiment Setup
●
Total data size: 12.8 TB
●
Cluster size: 128 nodes
●
Cluster hardware:
– DRAM: 512 GB DDR 4
– Storage: 4x 1.2 TB NVMe SSD
– Network: 100GbE Mellanox RDMA
Flash bandwidth
per node matches
network bandwidth
How is the Network Used?
Hardware limit
0
20
40
60
80
100
0 100 200 300 400 500
Throughput[Gbit/s]
Elapsed time (seconds)
What is the Problem?
JVM
Netty
TeraSort
Sorter
Serializer
sockets
Spark
● Spark uses legacy networking and storage APIs: no kernel-bypass
● Spark itself introduces additional I/O layers: Netty, serializer, sorter, etc.
TCP/IP
Ethernet
NIC
filesystem
block layer
iSCSI
SSD
Example: Shuffle (Map)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferedOutputStream
BufferJNI backend
BufferFile System Buffer Cache
Object
Example: Shuffle (Map)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferJNI
BufferFS
Object
Stream
Example: Shuffle (Map+Reduce)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferJNI
BufferFS
Object
Stream
Buffer
Buffer Buffer SocketBuffer
Netty
Buffer
Buffer
Kryo
Object
network
Sorter
Example: Shuffle (Map+Reduce)
Buffer
Buffer
Buffer
Sorter
Kryo
BufferJNI
BufferFS
Object
Stream
Buffer
Buffer Buffer SocketBuffer
Netty
Buffer
Buffer
Kryo
Object
network
Sorter
Difficult at
100 Gbps!
How can we fix this...
● Not just for shuffle
– Also for broadcast, RDD transport, inter-job sharing, etc.
● Not just for RDMA and NVMe hardware
– But for any possible future high-performance I/O hardware
● Not just for co-located compute/storage
– Also for resource disaggregation, heterogeneous resource distribution, etc.
● Not just improve things
– Make it perform at the hardware limit
The CRAIL Approach
Example: Crail Shuffle (Map)
Buffer
Crail Serializer
CrailOutputStream
Object
local
DRAM
remote
DRAM
local
Flash
remote
Flash
memcpy RDMA NVMe NVMeF
No
buffering
Example: Crail Shuffle (Reduce)
Buffer
Crail Serializer
CrailInputStream
local
DRAM
remote
DRAM
local
Flash
remote
Flash
memcpy RDMA NVMe NVMeF
No
buffering
Object
Example: Crail Shuffle (Reduce)
Buffer
Crail Serializer
CrailInputStream
local
DRAM
remote
DRAM
local
Flash
remote
Flash
memcpy RDMA NVMe NVMeF
No
buffering
Object
Similar for
broadcast,
Input/output,
RDD, etc.
Performance: Configuration
●
Experiments
– Memory-only: Broadcast, GroupBy, Sorting, SQL
– Memory/Flash: disaggregation, tiering
●
Cluster size: 8 nodes, except Sorting: 128 nodes
●
Cluster hardware:
– DRAM: 512 GB DDR 4
– Storage: 4x 1.2 TB NVMe SSD
– Network: 100GbE Mellanox RDMA
●
Spark 2.1.0, Alluxio 1.4, Crail 1.0
●
Hadoop.tmp.dir: RamFS for microbenchmarks, flash SSD for workloads
Spark Broadcast
val bcVar = sc.Broadcast(new Array[Byte](128))
sc.parallelize(1 to tasks, tasks).map(_ => {
bcVar.value.length
}).count
Spark
driver
Spark
Executor
/spark/broadcast/broadcast_0 Crail
Store
Broadcast writing
RPC+2xRDMA+RPC
CDF
Broadcast reading
RPC+RDMA1
2
3
4
Spark GroupBy (80M keys, 4K)
val pairs = sc.parallelize(1 to tasks, tasks).flatmap(_ => {
var values = new array[(Long,Array[Byte])](numKeys)
values = initValues(values)
}).cache().groupByKey().count()
Spark
Exec0
/shuffle/1/ f0 f1 fN
/shuffle/2/ f0 f1 fN
/shuffle/3 f0 f1 fN
Spark
Exec1
Spark
ExecN
Spark
Exec0
Spark
Exec1
Spark
ExecN
hash hash hash1
2
map: Hash
Append (file)
reduce: fetch
Bucket (dir)
Crail
Spark/
Vanilla
5x
2.5x
2x
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100 110 120
Throughput(Gbit/s)
Elapsed time (seconds)
1 core
4 cores
8 cores
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100 110 120
Throughput(Gbit/s)
Elapsed time (seconds)
1 core
4 cores
8 cores
Spark/
Crail
Sorting 12.8 TB on 128 nodes
Spark Spark/Crail
0
200
400
600
reduce
map
Spark/Crail Network UsageSorting Runtime
Task ID
Throughput(Gbit/s)
6x
time(sec)
How fast is this?
Spark/Crail Winner 2014 Winner 2016
Size (TB) 12.8 100 100
Time (sec) 98 1406 134
Total cores 2560 6592 10240
Network HW (Gbit/s) 100 10 100
Rate/core (GB/min) 3.13 0.66 4.4
Native C
distributed
sorting
benchmark
Spark
www.sortingbenchmark.org
Sorting rate of
Crail/Spark only 27%
slower than rate of
Winner 2016
Spark SQL JoinSpark
Exec0
/shuffle/1/ f0 f1 fN
/shuffle/2/ f0 f1 fN
/shuffle/3 f0 f1 fN
Spark
Exec1
Spark
ExecN
hash hash hash
1
2
map: Hash
Append (file)
val ds1 = sparkSession.read.parquet(“...”) //64GB dataset
val ds2 = sparkSession.read.parquet(“...”) //64GB dataset
val resultDS = ds1.joinWith(ds2, ds1(“key”) === ds2(“key”)) //~100GB
resultDS.write.format(“parquet”).mode(SaveMode.Overwrite).save(“...”)
parquet parquet parquet
read from
Hdfs/crail/alluxio
deser
sort
join
write
sort
join
write
sort
join
write
deser deser
3 Sort
Merge join
hdfs/crail/alluxio
4
CrailStore
0
10
20
30
40
50
output
join
hash
input
Time(sec)
Spark/
Alluxio
Spark/
Crail
Storage Tiering: DRAM & NVMe
Spark
Exec0
Spark
Exec1
Spark
ExecN
Memory Tier
Flash
Tier
Network
Input/Output/
Shuffle/
Spark
With Crail, replacing memory with NVMe flash
for storing input/output/shuffle reduces the 200GB sorting runtime by only 48%
0
20
40
60
80
100
120
Reduce
MapVanilla
Spark
(100%
in-memory)
memory/flash ratio
Time(sec)
sorting runtime
DRAM & Flash Disaggregation
Spark
Exec1
Spark
ExecN
Using Crail, a Spark 200GB sorting workload can be run with
memory and flash disaggregated at no extra cost
Spark
Exec0
storage
nodes
compute
nodes
0
40
80
120
160
200
Reduce
Map
Crail/
DRAM
Crail/
NVMe
Vanilla/
Alluxio
Input/Output/
Shuffle
Time(sec)
sorting runtime
Conclusions
●
Effectively using high-performance I/O hardware in
Spark is challenging
●
Crail is an attempt to re-think how data processing
frameworks (not only Spark) should interact with
network and storage hardware
– User-level I/O, storage disaggregation, memory/flash convergence
●
Spark's modular architecture allows Crail to be used
seamlessly
Crail for Spark is Open Source
●
www.crail.io
●
github.com/zrlio/spark-io
●
github.com/zrlio/crail
●
github.com/zrlio/parquetgenerator
●
github.com/zrlio/crail-terasort
Thank You.
Contributors:
Animesh Trivedi, Jonas Pfefferle, Michael Kaufmann, Bernard
Metzler, Radu Stoica, Ioannis Koltsidas, Nikolos Ioannou,
Christoph Hagleitner, Peter Hofstee, Evangelos Eleftheriou, ...

More Related Content

What's hot

Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
Slim Baltagi
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
Amazon Web Services
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
CDC Stream Processing with Apache Flink
CDC Stream Processing with Apache FlinkCDC Stream Processing with Apache Flink
CDC Stream Processing with Apache Flink
Timo Walther
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
Flink Forward
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
Spark tuning
Spark tuningSpark tuning
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Glen Hawkins
 

What's hot (20)

Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
CDC Stream Processing with Apache Flink
CDC Stream Processing with Apache FlinkCDC Stream Processing with Apache Flink
CDC Stream Processing with Apache Flink
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
Spark tuning
Spark tuningSpark tuning
Spark tuning
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
 

Similar to Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash with Patrick Stuedi

Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Виталий Стародубцев
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
Red_Hat_Storage
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 Server
JomaSoft
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Chris Fregly
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
RedWireServices
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Databricks
 
Offloading for Databases - Deep Dive
Offloading for Databases - Deep DiveOffloading for Databases - Deep Dive
Offloading for Databases - Deep Dive
UniFabric
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3UniFabric
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail Internet World
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
Philippe Fierens
 
Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006
Sal Marcus
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConf
Redis Labs
 
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
Meetup Oracle Database MAD_BCN: 4 Saborea ExadataMeetup Oracle Database MAD_BCN: 4 Saborea Exadata
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
avanttic Consultoría Tecnológica
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Databricks
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 

Similar to Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash with Patrick Stuedi (20)

Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 Server
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
 
Offloading for Databases - Deep Dive
Offloading for Databases - Deep DiveOffloading for Databases - Deep Dive
Offloading for Databases - Deep Dive
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConf
 
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
Meetup Oracle Database MAD_BCN: 4 Saborea ExadataMeetup Oracle Database MAD_BCN: 4 Saborea Exadata
Meetup Oracle Database MAD_BCN: 4 Saborea Exadata
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash with Patrick Stuedi

  • 1. Patrick Stuedi, IBM Research Running Spark on a High- Performance Cluster using RDMA Networking and NVMe Flash
  • 4. User-Level APIs Storage App HDD FileSystem iSCSI Block Layer OS Kernel Network App NIC Sockets Ethernet TCP/IP OS Kernel Storage App SSD, NIC SPDK Network App NIC OS KernelRDMA, DPDK Kernel Kernel Traditional / Kernel-based User-Level / Kernel-Bypass Needed to achieve 1us RTT!
  • 5. Remote Data Access 512 B 4 KB 64 KB 1 MB 0 200 400 600 800 1000 1200 NVMe over Fabrics iSCSI data size time[usec] DRAM NVMe Flash 512 B 4 KB 64 KB 1 MB 0 50 100 150 200 250 RDMA Sockets data size time[usec] User-level APIs offer 2-10x better performance than traditional kernel-based APIs
  • 6. Let's Use it! Spark NVMe Flash DRAM High-Performance RDMA network SQL Streaming MLlib GraphX Performance! Realtime! Disaggregation!
  • 7. Case Study: Sorting in Spark MapCores ReduceInput Output read & classify shuffle store Net Ser Net Ser Net Ser Sort Sort Sort
  • 8. Experiment Setup ● Total data size: 12.8 TB ● Cluster size: 128 nodes ● Cluster hardware: – DRAM: 512 GB DDR 4 – Storage: 4x 1.2 TB NVMe SSD – Network: 100GbE Mellanox RDMA Flash bandwidth per node matches network bandwidth
  • 9. How is the Network Used? Hardware limit 0 20 40 60 80 100 0 100 200 300 400 500 Throughput[Gbit/s] Elapsed time (seconds)
  • 10. What is the Problem? JVM Netty TeraSort Sorter Serializer sockets Spark ● Spark uses legacy networking and storage APIs: no kernel-bypass ● Spark itself introduces additional I/O layers: Netty, serializer, sorter, etc. TCP/IP Ethernet NIC filesystem block layer iSCSI SSD
  • 14. Example: Shuffle (Map+Reduce) Buffer Buffer Buffer Sorter Kryo BufferJNI BufferFS Object Stream Buffer Buffer Buffer SocketBuffer Netty Buffer Buffer Kryo Object network Sorter Difficult at 100 Gbps!
  • 15. How can we fix this... ● Not just for shuffle – Also for broadcast, RDD transport, inter-job sharing, etc. ● Not just for RDMA and NVMe hardware – But for any possible future high-performance I/O hardware ● Not just for co-located compute/storage – Also for resource disaggregation, heterogeneous resource distribution, etc. ● Not just improve things – Make it perform at the hardware limit
  • 17. Example: Crail Shuffle (Map) Buffer Crail Serializer CrailOutputStream Object local DRAM remote DRAM local Flash remote Flash memcpy RDMA NVMe NVMeF No buffering
  • 18. Example: Crail Shuffle (Reduce) Buffer Crail Serializer CrailInputStream local DRAM remote DRAM local Flash remote Flash memcpy RDMA NVMe NVMeF No buffering Object
  • 19. Example: Crail Shuffle (Reduce) Buffer Crail Serializer CrailInputStream local DRAM remote DRAM local Flash remote Flash memcpy RDMA NVMe NVMeF No buffering Object Similar for broadcast, Input/output, RDD, etc.
  • 20. Performance: Configuration ● Experiments – Memory-only: Broadcast, GroupBy, Sorting, SQL – Memory/Flash: disaggregation, tiering ● Cluster size: 8 nodes, except Sorting: 128 nodes ● Cluster hardware: – DRAM: 512 GB DDR 4 – Storage: 4x 1.2 TB NVMe SSD – Network: 100GbE Mellanox RDMA ● Spark 2.1.0, Alluxio 1.4, Crail 1.0 ● Hadoop.tmp.dir: RamFS for microbenchmarks, flash SSD for workloads
  • 21. Spark Broadcast val bcVar = sc.Broadcast(new Array[Byte](128)) sc.parallelize(1 to tasks, tasks).map(_ => { bcVar.value.length }).count Spark driver Spark Executor /spark/broadcast/broadcast_0 Crail Store Broadcast writing RPC+2xRDMA+RPC CDF Broadcast reading RPC+RDMA1 2 3 4
  • 22. Spark GroupBy (80M keys, 4K) val pairs = sc.parallelize(1 to tasks, tasks).flatmap(_ => { var values = new array[(Long,Array[Byte])](numKeys) values = initValues(values) }).cache().groupByKey().count() Spark Exec0 /shuffle/1/ f0 f1 fN /shuffle/2/ f0 f1 fN /shuffle/3 f0 f1 fN Spark Exec1 Spark ExecN Spark Exec0 Spark Exec1 Spark ExecN hash hash hash1 2 map: Hash Append (file) reduce: fetch Bucket (dir) Crail Spark/ Vanilla 5x 2.5x 2x 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 100 110 120 Throughput(Gbit/s) Elapsed time (seconds) 1 core 4 cores 8 cores 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 100 110 120 Throughput(Gbit/s) Elapsed time (seconds) 1 core 4 cores 8 cores Spark/ Crail
  • 23. Sorting 12.8 TB on 128 nodes Spark Spark/Crail 0 200 400 600 reduce map Spark/Crail Network UsageSorting Runtime Task ID Throughput(Gbit/s) 6x time(sec)
  • 24. How fast is this? Spark/Crail Winner 2014 Winner 2016 Size (TB) 12.8 100 100 Time (sec) 98 1406 134 Total cores 2560 6592 10240 Network HW (Gbit/s) 100 10 100 Rate/core (GB/min) 3.13 0.66 4.4 Native C distributed sorting benchmark Spark www.sortingbenchmark.org Sorting rate of Crail/Spark only 27% slower than rate of Winner 2016
  • 25. Spark SQL JoinSpark Exec0 /shuffle/1/ f0 f1 fN /shuffle/2/ f0 f1 fN /shuffle/3 f0 f1 fN Spark Exec1 Spark ExecN hash hash hash 1 2 map: Hash Append (file) val ds1 = sparkSession.read.parquet(“...”) //64GB dataset val ds2 = sparkSession.read.parquet(“...”) //64GB dataset val resultDS = ds1.joinWith(ds2, ds1(“key”) === ds2(“key”)) //~100GB resultDS.write.format(“parquet”).mode(SaveMode.Overwrite).save(“...”) parquet parquet parquet read from Hdfs/crail/alluxio deser sort join write sort join write sort join write deser deser 3 Sort Merge join hdfs/crail/alluxio 4 CrailStore 0 10 20 30 40 50 output join hash input Time(sec) Spark/ Alluxio Spark/ Crail
  • 26. Storage Tiering: DRAM & NVMe Spark Exec0 Spark Exec1 Spark ExecN Memory Tier Flash Tier Network Input/Output/ Shuffle/ Spark With Crail, replacing memory with NVMe flash for storing input/output/shuffle reduces the 200GB sorting runtime by only 48% 0 20 40 60 80 100 120 Reduce MapVanilla Spark (100% in-memory) memory/flash ratio Time(sec) sorting runtime
  • 27. DRAM & Flash Disaggregation Spark Exec1 Spark ExecN Using Crail, a Spark 200GB sorting workload can be run with memory and flash disaggregated at no extra cost Spark Exec0 storage nodes compute nodes 0 40 80 120 160 200 Reduce Map Crail/ DRAM Crail/ NVMe Vanilla/ Alluxio Input/Output/ Shuffle Time(sec) sorting runtime
  • 28. Conclusions ● Effectively using high-performance I/O hardware in Spark is challenging ● Crail is an attempt to re-think how data processing frameworks (not only Spark) should interact with network and storage hardware – User-level I/O, storage disaggregation, memory/flash convergence ● Spark's modular architecture allows Crail to be used seamlessly
  • 29. Crail for Spark is Open Source ● www.crail.io ● github.com/zrlio/spark-io ● github.com/zrlio/crail ● github.com/zrlio/parquetgenerator ● github.com/zrlio/crail-terasort
  • 30. Thank You. Contributors: Animesh Trivedi, Jonas Pfefferle, Michael Kaufmann, Bernard Metzler, Radu Stoica, Ioannis Koltsidas, Nikolos Ioannou, Christoph Hagleitner, Peter Hofstee, Evangelos Eleftheriou, ...