SlideShare a Scribd company logo
1 of 28
Download to read offline
Costin Iancu, Khaled Ibrahim – LBNL
Nicholas Chaimov – U. Oregon
Spark on Supercomputers:
A Tale of the Storage Hierarchy
Apache Spark
• Developed for cloud environments
• Specialized runtime provides for
– Performance J, Elastic parallelism, Resilience
• Programming productivity through
– HLL front-ends (Scala, R, SQL), multiple domain-specific libraries:
Streaming, SparkSQL, SparkR, GraphX, Splash, MLLib, Velox
• We have huge datasets but little penetration in HPC
Apache Spark
• In-memory Map-Reduce framework
• Central abstraction is the Resilient Distributed Dataset.
• Data movement is important
– Lazy, on-demand
– Horizontal (node-to-node) – shuffle/Reduce
– Vertical (node-to-storage) - Map/Reduce
p1
p2
p3
textFile
p1
p2
p3
flatMap
p1
p2
p3
map
p1
p2
p3
reduceByKey
(local)
STAGE 0
p1
p2
p3
reduceByKey
(global)
STAGE 1
JOB 0
Data Centers/Clouds
Node	local	storage,	assumes	 all	disk
operations	are	equal
Disk I/O	optimized	for	latency
Network optimized	for	bandwidth
HPC
Global	file	system,	asymmetry	expected
Disk I/O	optimized	for	bandwidth
Network optimized	for	latency
HDD/
SSD
NIC
CPU
Mem
HDD/
SDD
HDD/
SDD
HDD/
SDD
CPU
Mem
NIC
HDD/
SDD
HDD/
SSD
HDD/
SSD
HDD/
SSD
HDD
/SSD
Cloud: commodity CPU,
memory, HDD/SSD NIC
Data appliance: server CPU,
large fast memory, fast SSD
Backend storage
Intermediate
storage
HPC: server CPU, fast memory,
combo of fast and slower storage
HDD/
SSD
NIC
CPU
Mem
HDD/
SDD
HDD/
SDD
HDD/
SDD
CPU
Mem
NIC
HDD/
SDD
HDD/
SSD
HDD/
SSD
HDD/
SSD
HDD
/SSD
Backend storage
Intermediate
storage
2.5 GHz Intel Haswell - 24 cores
2.3 GHz Intel Haswell – 32 cores
128GB/1.5TB DDR4
128GB DDR4
320 GB of SSD local 56 Gbps FDR InfiniBand
Cray Data Warp
1.8PB at 1.7TB/s
Sonexion Lustre 30PB
Cray Aries
Comet (DELL)
Cori (Cray XC40)
Scaling Spark on Cray XC40
(It’s all about file system metadata)
Not ALL I/O is Created Equal
0	
2000	
4000	
6000	
8000	
10000	
12000	
1	 2	 4	 8	 16	
Time	Per	Opera1on	(microseconds)	
Nodes	
GroupByTest	-	I/O	Components	-	Cori		
Lustre	-	Open	 BB	Striped	-	Open	
BB	Private	-	Open	 Lustre	-	Read	
BB	Striped	-	Read	 BB	Private	-	Read	
Lustre	-	Write	 BB	Striped	-	Write	
# Shuffle opens = # Shuffle reads O(cores2)
Time per open increases with scale, unlike read/write
9,216
36,864
147,456
589,824
2,359,296
opens
I/O Variability is HIGH
fopen is a problem:
• Mean time is 23X larger than SSD
• Variability is 14,000X
READ fopen
Improving I/O Performance
Eliminate file metadata operations
1. Keep files open (cache fopen)
• Surprising 10%-20% improvement on data appliance
• Argues for user level file systems, gets rid of serialized system calls
2. Use file system backed by single Lustre file for shuffle
• This should also help on systems with local SSDs
3. Use containers
• Speeds up startup, up to 20% end-to-end performance improvement
• Solutions need to be used in conjunction
– E.g. fopen from Parquet reader
Plenty of details in “Scaling Spark on HPC Systems”. HPDC 2016
0
100
200
300
400
500
600
700
32 160 320 640 1280 2560 5120 10240
Time	(s)
Cores
Cori	- GroupBy	- Weak	Scaling	- Time	to	Job	Completion
Ramdisk
Mounted	File
Lustre
Scalability
6x
12x 14x
19x
33x
61x
At 10,240 cores
only 1.6x slower
than RAMdisk
(in memory
execution)
We scaled Spark from O(100)
up to
O(10,000) cores
File-Backed Filesystems
• NERSC Shifter (container infrastructure for HPC)
– Compatible with Docker images
– Integrated with Slurm scheduler
– Can control mounting of filesystems within container
• Per-Node Cache
– File-backed filesystem mounted within each node’s container instance at common
path (/mnt)
– ​--volume=$SCRATCH/backingFile:/mnt:perNodeCache=
size=100G
– File for each node is created stored on backend Lustre filesystem
– Single file open — intermediate data file opens are kept local
Now the fun part J
Architectural Performance Considerations
Cori Comet
The Supercomputer vs The Data Appliance
HDD/
SSD
NIC
CPU
Mem
HDD/
SDD
HDD/
SDD
HDD/
SDD
CPU
Mem
NIC
HDD/
SDD
HDD/
SSD
HDD/
SSD
HDD/
SSD
HDD
/SSD
Backend storage
Intermediate
storage
2.5 GHz Intel Haswell - 24 cores
2.3 GHz Intel Haswell – 32 cores
128GB/1.5TB DDR4
128GB DDR4
320 GB of SSD local 56 Gbps FDR InfiniBand
Cray Data Warp
1.8PB at 1.7TB/s
Sonexion Lustre 30PB
Cray Aries
Comet (DELL)
Cori (Cray XC40)
CPU, Memory, Network, Disk?
• Multiple extensions to Blocked Time Analysis (Ousterhout, 2015)
• BTA indicated that CPU dominates
– Network 2%, disk 19%
• Concentrate on scaling out, weak scaling studies
– Spark-perf, BigDataBenchmark, TPC-DS, TeraSort
• Interested in determining right ratio, machine balance for
– CPU, memory, network, disk …
• Spark 2.0.2 & Spark-RDMA 0.9.4 from Ohio State University,
Hadoop 2.6
Storage hierarchy and performance
Global Storage Matches Local Storage
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
Lustre
Mount+Pool
SSD+IB
Lustre
Mount+Pool
SSD+IB
Lustre
Mount+Pool
SSD+IB
1 5 20
Time	(ms)
Nodes	(32	cores)
App
JVM
RW	Input
RW	Shuffle
Open	Input
Open	Shuffle
• Variability matters more than
advertised latency and bandwidth
number
• Storage performance
obscured/mitigated by network
due to client/server in
BlockManager
• Small scale local is slightly
faster
• Large scale global is faster
Disk+Network Latency/BW
Metadata Overhead
Cray XC40 – TeraSort (100GB/node)
0
0.2
0.4
0.6
0.8
1
1.2
1 16 1 16
Comet	RDMA	Singularity	24	Cores Cori	Shifter	24	Cores
Average	Across	MLLib	Benchmarks
App Fetch JVM
Global Storage Matches Local Storage
11.8%
Fetch
12.5%
Fetch
Intermediate Storage Hurts
Performance
0
2000
4000
6000
8000
10000
12000
14000
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
Cori	Shifter	Lustre Cori	Shifter	BB	Striped Cori	Shifter	BB	Private
Time	(s)
TPC-DS	- Weak	Scaling
App Fetch JVM
19.4%	slower
on	average
86.8%	slower
on	average
(Without our optimizations, intermediate storage scaled better)
Networking performance
0
50
100
150
200
250
300
350
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
Comet	Singularity Comet	RDMA	Singularity Cori	Shifter	24	cores
Time	(s)
Singular	Value	Decomposition
App
Fetch
JVM
Latency or Bandwidth?
10X in bandwidth,
latency differences matter
Can hide 2X differences
Average message size for spark-perf is 43B
Network Matters at Scale
0
50
100
150
200
250
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Cori	Shifter	24	Cores Cori	Shifter	32	Cores
Time	(s)
Average	Across	Benchmarks
App Fetch JVM
44%
CPU
More cores or better memory?
• Need more cores to hide
disk and network latency
at scale.
• Preliminary experiences
with Intel KNL are bad
• Too much concurrency
• Not enough integer
throughput
• Execution does not seem
to be memory bandwidth
limited
0
50
100
150
200
250
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Cori	Shifter	24	Cores Cori	Shifter	32	Cores
Time	(s)
Average	Across	Benchmarks
App Fetch JVM
Summary/Conclusions
• Latency and bandwidth are important, but not dominant
– Variability more important than marketing numbers
• Network time dominates at scale
– Network, disk is mis-attributed as CPU
• Comet matches Cori up to 512 cores, Cori twice as fast at
2048 cores
– Spark can run well on global storage
• Global storage opens the possibility of global name space, no
more client-server
Ackowledgement
Work partially supported by
Intel Parallel Computing Center: Big Data Support
for HPC
Thank You.
Questions, collaborations, free software
cciancu@lbl.gov
kzibrahim@lbl.gov
nchaimov@uoregon.edu
Burst Buffer
Setup
• Cray XC30 at NERSC (Edison): 2.4 GHz IvyBridge - Global
• Cray XC40 at NERSC (Cori): 2.3 GHz Haswell + Cray
DataWarp
• Comet at SDSC: 2.5GHz Haswell, InfiniBand FDR, 320 GB
SSD, 1.5TB memory - LOCAL

More Related Content

What's hot

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkJen Aman
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Databricks
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangDatabricks
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingDatabricks
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Shelan Perera
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learningfelixcss
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit
 
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Databricks
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarDatabricks
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
 
Reactive Streams, Linking Reactive Application To Spark Streaming
Reactive Streams, Linking Reactive Application To Spark StreamingReactive Streams, Linking Reactive Application To Spark Streaming
Reactive Streams, Linking Reactive Application To Spark StreamingSpark Summit
 
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...Spark Summit
 

What's hot (20)

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
Reactive Streams, Linking Reactive Application To Spark Streaming
Reactive Streams, Linking Reactive Application To Spark StreamingReactive Streams, Linking Reactive Application To Spark Streaming
Reactive Streams, Linking Reactive Application To Spark Streaming
 
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...
 

Similar to Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin Iancu and Nicholas Chaimov

SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...In-Memory Computing Summit
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalTommy Lee
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 julyDiego Rodriguez
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Roger Zhou 周志强
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status ReportLiu Yuan
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
 

Similar to Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin Iancu and Nicholas Chaimov (20)

SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Linux one vs x86
Linux one vs x86 Linux one vs x86
Linux one vs x86
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 july
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin Iancu and Nicholas Chaimov

  • 1. Costin Iancu, Khaled Ibrahim – LBNL Nicholas Chaimov – U. Oregon Spark on Supercomputers: A Tale of the Storage Hierarchy
  • 2. Apache Spark • Developed for cloud environments • Specialized runtime provides for – Performance J, Elastic parallelism, Resilience • Programming productivity through – HLL front-ends (Scala, R, SQL), multiple domain-specific libraries: Streaming, SparkSQL, SparkR, GraphX, Splash, MLLib, Velox • We have huge datasets but little penetration in HPC
  • 3. Apache Spark • In-memory Map-Reduce framework • Central abstraction is the Resilient Distributed Dataset. • Data movement is important – Lazy, on-demand – Horizontal (node-to-node) – shuffle/Reduce – Vertical (node-to-storage) - Map/Reduce p1 p2 p3 textFile p1 p2 p3 flatMap p1 p2 p3 map p1 p2 p3 reduceByKey (local) STAGE 0 p1 p2 p3 reduceByKey (global) STAGE 1 JOB 0
  • 4. Data Centers/Clouds Node local storage, assumes all disk operations are equal Disk I/O optimized for latency Network optimized for bandwidth HPC Global file system, asymmetry expected Disk I/O optimized for bandwidth Network optimized for latency
  • 5. HDD/ SSD NIC CPU Mem HDD/ SDD HDD/ SDD HDD/ SDD CPU Mem NIC HDD/ SDD HDD/ SSD HDD/ SSD HDD/ SSD HDD /SSD Cloud: commodity CPU, memory, HDD/SSD NIC Data appliance: server CPU, large fast memory, fast SSD Backend storage Intermediate storage HPC: server CPU, fast memory, combo of fast and slower storage
  • 6. HDD/ SSD NIC CPU Mem HDD/ SDD HDD/ SDD HDD/ SDD CPU Mem NIC HDD/ SDD HDD/ SSD HDD/ SSD HDD/ SSD HDD /SSD Backend storage Intermediate storage 2.5 GHz Intel Haswell - 24 cores 2.3 GHz Intel Haswell – 32 cores 128GB/1.5TB DDR4 128GB DDR4 320 GB of SSD local 56 Gbps FDR InfiniBand Cray Data Warp 1.8PB at 1.7TB/s Sonexion Lustre 30PB Cray Aries Comet (DELL) Cori (Cray XC40)
  • 7. Scaling Spark on Cray XC40 (It’s all about file system metadata)
  • 8. Not ALL I/O is Created Equal 0 2000 4000 6000 8000 10000 12000 1 2 4 8 16 Time Per Opera1on (microseconds) Nodes GroupByTest - I/O Components - Cori Lustre - Open BB Striped - Open BB Private - Open Lustre - Read BB Striped - Read BB Private - Read Lustre - Write BB Striped - Write # Shuffle opens = # Shuffle reads O(cores2) Time per open increases with scale, unlike read/write 9,216 36,864 147,456 589,824 2,359,296 opens
  • 9. I/O Variability is HIGH fopen is a problem: • Mean time is 23X larger than SSD • Variability is 14,000X READ fopen
  • 10. Improving I/O Performance Eliminate file metadata operations 1. Keep files open (cache fopen) • Surprising 10%-20% improvement on data appliance • Argues for user level file systems, gets rid of serialized system calls 2. Use file system backed by single Lustre file for shuffle • This should also help on systems with local SSDs 3. Use containers • Speeds up startup, up to 20% end-to-end performance improvement • Solutions need to be used in conjunction – E.g. fopen from Parquet reader Plenty of details in “Scaling Spark on HPC Systems”. HPDC 2016
  • 11. 0 100 200 300 400 500 600 700 32 160 320 640 1280 2560 5120 10240 Time (s) Cores Cori - GroupBy - Weak Scaling - Time to Job Completion Ramdisk Mounted File Lustre Scalability 6x 12x 14x 19x 33x 61x At 10,240 cores only 1.6x slower than RAMdisk (in memory execution) We scaled Spark from O(100) up to O(10,000) cores
  • 12. File-Backed Filesystems • NERSC Shifter (container infrastructure for HPC) – Compatible with Docker images – Integrated with Slurm scheduler – Can control mounting of filesystems within container • Per-Node Cache – File-backed filesystem mounted within each node’s container instance at common path (/mnt) – ​--volume=$SCRATCH/backingFile:/mnt:perNodeCache= size=100G – File for each node is created stored on backend Lustre filesystem – Single file open — intermediate data file opens are kept local
  • 13. Now the fun part J Architectural Performance Considerations Cori Comet The Supercomputer vs The Data Appliance
  • 14. HDD/ SSD NIC CPU Mem HDD/ SDD HDD/ SDD HDD/ SDD CPU Mem NIC HDD/ SDD HDD/ SSD HDD/ SSD HDD/ SSD HDD /SSD Backend storage Intermediate storage 2.5 GHz Intel Haswell - 24 cores 2.3 GHz Intel Haswell – 32 cores 128GB/1.5TB DDR4 128GB DDR4 320 GB of SSD local 56 Gbps FDR InfiniBand Cray Data Warp 1.8PB at 1.7TB/s Sonexion Lustre 30PB Cray Aries Comet (DELL) Cori (Cray XC40)
  • 15. CPU, Memory, Network, Disk? • Multiple extensions to Blocked Time Analysis (Ousterhout, 2015) • BTA indicated that CPU dominates – Network 2%, disk 19% • Concentrate on scaling out, weak scaling studies – Spark-perf, BigDataBenchmark, TPC-DS, TeraSort • Interested in determining right ratio, machine balance for – CPU, memory, network, disk … • Spark 2.0.2 & Spark-RDMA 0.9.4 from Ohio State University, Hadoop 2.6
  • 16. Storage hierarchy and performance
  • 17. Global Storage Matches Local Storage 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000 Lustre Mount+Pool SSD+IB Lustre Mount+Pool SSD+IB Lustre Mount+Pool SSD+IB 1 5 20 Time (ms) Nodes (32 cores) App JVM RW Input RW Shuffle Open Input Open Shuffle • Variability matters more than advertised latency and bandwidth number • Storage performance obscured/mitigated by network due to client/server in BlockManager • Small scale local is slightly faster • Large scale global is faster Disk+Network Latency/BW Metadata Overhead Cray XC40 – TeraSort (100GB/node)
  • 18. 0 0.2 0.4 0.6 0.8 1 1.2 1 16 1 16 Comet RDMA Singularity 24 Cores Cori Shifter 24 Cores Average Across MLLib Benchmarks App Fetch JVM Global Storage Matches Local Storage 11.8% Fetch 12.5% Fetch
  • 19. Intermediate Storage Hurts Performance 0 2000 4000 6000 8000 10000 12000 14000 1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Cori Shifter Lustre Cori Shifter BB Striped Cori Shifter BB Private Time (s) TPC-DS - Weak Scaling App Fetch JVM 19.4% slower on average 86.8% slower on average (Without our optimizations, intermediate storage scaled better)
  • 21. 0 50 100 150 200 250 300 350 1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Comet Singularity Comet RDMA Singularity Cori Shifter 24 cores Time (s) Singular Value Decomposition App Fetch JVM Latency or Bandwidth? 10X in bandwidth, latency differences matter Can hide 2X differences Average message size for spark-perf is 43B
  • 22. Network Matters at Scale 0 50 100 150 200 250 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Cori Shifter 24 Cores Cori Shifter 32 Cores Time (s) Average Across Benchmarks App Fetch JVM 44%
  • 23. CPU
  • 24. More cores or better memory? • Need more cores to hide disk and network latency at scale. • Preliminary experiences with Intel KNL are bad • Too much concurrency • Not enough integer throughput • Execution does not seem to be memory bandwidth limited 0 50 100 150 200 250 1 2 4 8 16 32 64 1 2 4 8 16 32 64 Cori Shifter 24 Cores Cori Shifter 32 Cores Time (s) Average Across Benchmarks App Fetch JVM
  • 25. Summary/Conclusions • Latency and bandwidth are important, but not dominant – Variability more important than marketing numbers • Network time dominates at scale – Network, disk is mis-attributed as CPU • Comet matches Cori up to 512 cores, Cori twice as fast at 2048 cores – Spark can run well on global storage • Global storage opens the possibility of global name space, no more client-server
  • 26. Ackowledgement Work partially supported by Intel Parallel Computing Center: Big Data Support for HPC
  • 27. Thank You. Questions, collaborations, free software cciancu@lbl.gov kzibrahim@lbl.gov nchaimov@uoregon.edu
  • 28. Burst Buffer Setup • Cray XC30 at NERSC (Edison): 2.4 GHz IvyBridge - Global • Cray XC40 at NERSC (Cori): 2.3 GHz Haswell + Cray DataWarp • Comet at SDSC: 2.5GHz Haswell, InfiniBand FDR, 320 GB SSD, 1.5TB memory - LOCAL