SlideShare a Scribd company logo
ACIC:
AUTOMATIC CLOUD I/O CONFIGURATOR
FOR HPC APPLICATIONS
Mingliang Liu*, Ye Jin^, Jidong Zhai*, Yan Zhai*,
Qianqian Shi*, Xiaosong Ma^, Wenguang Chen*
*Tsinghua University
^North Carolina State University
1SuperComputing 20136/28/2017
Background
2
• HPC in Cloud
• Dedicated for high-end cloud computing in science
• Trend to migrate HPC applications to cloud
SuperComputing 20136/28/2017
HPC in Cloud – pros and cons
3
• Local Clusters
+ Dedicated IB network
+ Run at physical machine
- Fixed nodes types/numbers
- Shared OS / file system / libraries
- Gap between I/O and computation
- Fixed device types/numbers
- One-size-fits-all configuration
- Per-platform configuring options
• HPC in Cloud [Yan’11]
- Shared 10Gb Ethernet
- Virtualization overhead
+ Online instance acquisition
+ Fully controlled virtual machines
- I/O overhead by virtualization
+ Multiple device/QoS choice
+ Application specific configuration
+ Shared cloud options by all users
Key Idea: Help users find desired I/O system configurations
SuperComputing 20136/28/2017
Does I/O Configuration Matter?
4
• Configurations differ in performance and cost [Mingliang’11]
• No single I/O system configuration beats all
• Optimal configurations for performance and cost contradict
SuperComputing 2013
BTIO application with 6 I/O configurations. The lower the better
6/28/2017
Outline
• Motivation
• Challenges
• Methodology
• Evaluation
• Conclusion
5SuperComputing 20136/28/2017
10 Gb Ethernet
Compute
Instances
…
NFS Server
SuperComputing 2013 66/28/2017
PVFS Server EBS PVFS Server EBSEphemeral Ephemeral
PVFS EBSPVFS
EBS
What Can We Configure?
7
File System
File system internal parameters
(Stripe Size: 64KB/4MB)
File system
(NFS vs. PVFS2)
I/O Server
I/O server number
(1/2/4)
I/O server placement
(Dedicated vs. Part-time)
Storage Device
Software RAID
(RAID 0 vs. No RAID)
Device number
(1/2)
Cloud storage device type
(EBS vs. Ephemeral vs. SSD)
SuperComputing 20136/28/2017
What Do Configurations Depend On?
8
Name Value
Number of all processes {32, 64, 128, 256}
Number of I/O processes {32, 64, 128, 256}
I/O interface {POSIX, MPIIO}
I/O iteration count {1, 10, 100}
Data size {1, 4, 16, 32, 128, 512} MB
Request size {256KB, 4MB,16MB, 128MB}
Read and/or write {read, write}
Collective {yes, no}
File sharing {share, individual}
SuperComputing 20136/28/2017
• Target (performance, or cost)
• Workload I/O Characteristics
How to Configure Optimally?
• Configure I/O system by hand [Heshan’11]
9SuperComputing 2013
• Try all configurations for one application
• Configuration burden to scientific users
• Time- and money-consuming
6/28/2017
Hard
Expensive
• Obvious gaps between manual configurations and optimal ones
Our Approach
• Automatically predict and select optimal I/O configurations
• Map workload I/O characteristics to configurations
10SuperComputing 20136/28/2017
I/O System Configuration Options
Name Value
Disk device {EBS, ephemeral}
File system {NFS, PVFS2}
Instance type {cc1.4xlarge, cc2.8xlarge}
I/O server number {1, 2, 4}
Placement {part-time, dedicated}
Stripe size {64KB, 4MB}
Workload I/O Characteristics
Name Value
Number of all processes {32, 64, 128, 256}
Number of I/O processes {32, 64, 128, 256}
I/O interface {POSIX, MPIIO}
I/O iteration count {1, 10, 100}
Data size {1, 4, 16, 32, 128, 512} MB
Request size {256KB, 4MB,16MB, 128MB}
Read and/or write {read, write}
Collective {yes, no}
File sharing {share, individual}
15 dimension > 1M
Outline
• Motivation
• Challenges
• Methodology
• Evaluation
• Conclusion
11SuperComputing 20136/28/2017
15-Dimension
Exploration Space
Dimension
Reducer
Prediction Model
(CART)
Reduced
Exploration Sets
Application’s IO
Characteristics
Query Result
Recommended I/O
Configuration
Target Cloud
I/O Characteristic I/O Configuration
Run
Configure
Training
Database
Input
Train
Insert
Query Conditions
IOR
Overview
12SuperComputing 20136/28/2017
ACIC
Cloud System
I/O Configuration
Application I/O
Characteristic
Dimension Reducer
• Identify relative importance of parameters (PB Matrix [Plackett’46])
13
Row
Parameters
Perf. Value
A B C D E
1 1 1 1 -1 1 19
2 -1 1 1 1 -1 21
3 -1 -1 1 1 1 2
4 1 -1 -1 1 1 11
5 -1 1 -1 -1 1 72
6 1 -1 1 -1 -1 100
7 1 1 -1 1 -1 8
8 -1 -1 -1 -1 -1 3
Effect Value 40 4 48 152 28
Rank 3 5 2 1 4
Sample PB design working with N = 5 and N’ = 8
SuperComputing 20136/28/2017
48
[4, 100]
Parameter Ranks
14
Rank Name Value
1 Data size {1, 4, 16, 32, 128, 512} MB
2 Read and/or write {read, write}
3 I/O server number {1, 2, 4}
4 Number of I/O processes {32, 64, 128, 256}
5 File system {NFS, PVFS2}
6 Stripe size {64KB, 4MB}
7 Placement {part-time, dedicated}
8 Request size {256KB, 4MB,16MB, 128MB}
9 I/O interface {POSIX, MPIIO}
10 Disk device {EBS, ephemeral}
11 Collective {yes, no}
12 Instance type {cc1.4xlarge, cc2.8xlarge}
13 I/O iteration count {1, 10, 100}
14 Number of all processes {32, 64, 128, 256}
15 File sharing {share, individual}
SuperComputing 20136/28/2017
15-Dimension
Exploration Space
Dimension
Reducer
Prediction Model
(CART)
Reduced
Exploration Sets
Application’s IO
Characteristics
Query Result
Recommended I/O
Configuration
Target Cloud
I/O Characteristic I/O Configuration
Run
Configure
Training
Database
Input
Train
Insert
Query Conditions
IOR
Overview
15SuperComputing 20136/28/2017
Target HPC
Application
IO Profiler
ACIC
[Olshen’84]
[shan’08]
Crowd-Sourcing
CART Example
16
…
STD = 0.147
AVG = 1.9
FILE SYSTEM
REQUEST <= 34MB
STD = 0.069
AVG = 2.2
DATA_SIZE
PVFS2
STD = 0.202
AVG = 1.3
DATA_SIZE
NFS
STD = 0.041
AVG = 2.1
DEVICE
<= 24MB
STD = 0.014
AVG = 0.8
> 24576 KB
STD = 0.03
AVG = 1.6
<= 24576 KB
STD = 0.066
AVG = 2.4
> 24MB
STD = 0.006
AVG = 2.2
ephemeral
STD = 0.001
AVG = 2.0
EBS
…
SuperComputing 20136/28/2017
(…,request_size = 4MB, data_size = 16MB, …, file_system = PVFS2)
15-Dimension
Exploration Space
Dimension
Reducer
Prediction Model
(CART)
Reduced
Exploration Sets
Application’s IO
Characteristics
Query Result
Recommended I/O
Configuration
Target Cloud
I/O Characteristic I/O Configuration
Run
Configure
Training
Database
Input
Train
Insert
Query Conditions
IOR
Overview
17SuperComputing 20136/28/2017
ACIC
Outline
• Motivation
• Challenges
• Methodology
• Evaluation
• Experiment Setup
• Effectiveness
• Training Cost
• Conclusion
18SuperComputing 20136/28/2017
Evaluation - Platform
• Amazon Cluster Computing Instance
• 2 * 8-core Intel Xeon CPU, 60.5GB RAM
• 10 Gigabit Ethernet
• Amazon Linux AMI, Intel compiler & MPI runtime
• Storage Device
• Ephemeral
• EBS (Elastic Block Store)
• Software RAID
• Baseline Configuration
• NFS, dedicated, 1 EBS device
19SuperComputing 20136/28/2017
Name Domain CPU Network Read/Write API
BTIO Physics High High Write MPIIO
FLASHIO Astrophysics Low Low Write MPIIO
mpiBLAST Biology Medium Medium Read POSIX
MADbench2 Cosmology Low Medium Read & Write MPIIO
• Selected HPC Workloads
Evaluation - Applications
20SuperComputing 20136/28/2017
App. Proc. Device P/D FS IO Servers Strip Size
BTIO
64 EBS P NFS 1 N/A
256 eph. P PVFS2 4 4MB
FLASHIO
64 eph. D NFS 1 N/A
256 eph. P NFS 1 N/A
mpiBLAST
32 eph. P PVFS2 4 64KB
64 eph. D PVFS2 4 4MB
128 eph. D PVFS2 4 4MB
MADbenc
h2
64 eph. D PVFS2 4 4MB
256 EBS D PVFS2 4 4MB
• Optimal Performance Configurations
7/9: It’s difficult to guess optimal one even within the 5-D space.
Evaluation - No One Excels All
21SuperComputing 20136/28/2017
9
test
cases
7
unique
configs
Guess?
Effectiveness of Exec. Time Optimization
22
Median
ACIC
Baseline
SuperComputing 20136/28/2017
• Large performance range under different configurations
• Near optimal configurations predicted by ACIC
Effectiveness of Total Cost Saving
23SuperComputing 20136/28/2017
• Even better results in total cost saving by ACIC
Training More Data
6/28/2017 SuperComputing 2013 24
(a) Execution time (over baseline)
Figure 7: Accuracy enhancement from examining top-k
0
20
40
60
80
100
7 8 9 10 11 12 13 14 15
0.1
1
10
100
1000
Costsavingunderbaseline(%)
Trainingcost(K$)
Number of model papameters
Training cost
BTIO-64
FLASHIO-256
mpiBLAST-128
MADbench2-256
Figure 8: Impact on prediction performance using di↵erent
numbers of top ranking model parameters
cost of only
timization e
(by collectin
appears to
the estimate
exponential
exploring th
straints, we
dimensions,
will bring si
5.5 Com
20%
40%
60%
80%
100%• More training data points, higher prediction accuracy
• The gain is heavily application-dependent
• Training cost increases exponentially
1000$ × c
100,000$ × c
Outline
• Motivation
• Challenges
• Methodology
• Evaluation
• Conclusion
25SuperComputing 20136/28/2017
Conclusion
• I/O configurations is crucial to HPC in cloud
• Manual configuration is error-prone even for experts
• Automatic I/O configurator is helpful
• Building a prediction model is challenging
• Reduce high dimensional space to sample training data
• Reuse training data in crowd-sourcing way to amortize cost
26SuperComputing 20136/28/2017
27
http://hpc.cs.tsinghua.edu.cn/ACIC
SuperComputing 2013
• Thank Heshan Lin and Ruini Xue for joining user study
• Thank anonymous reviewers for their useful comments
• Supported in China: 863 NO.2012AA01A302, NSFC 61133006 and 61103021
• Supported in U.S.: NSF awards (CNS-0546301, CNS-0915861, and CCF-0937908)
6/28/2017
References
• [Yan’11] Y. Zhai, M. Liu, J. Zhai, X. Ma, and W. Chen. Cloud Versus In-house
Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI
Applications. In SC. ACM, 2011.
• [Plackett’46] R. Plackett and J. Burman. The Design of Optimum Multifactorial
Experiments. Biometrika, 1946.
• [Olshen’84] L. Olshen and C. Stone. Classication and Regression Trees.
Wadsworth International Group, 1984.
• [Mesnier’07] M. Mesnier, M. Wachs, R. Sambasivan, A. Zheng, and G. Ganger.
Modeling the Relative Fitness of Storage. In SIGMETRICS. ACM, 2007.
• [Mingliang’11] Mingliang Liu and Jidong Zhai and Yan Zhai and Xiaosong Ma
and Wenguang Chen. One Optimized I/O Configuration per HPC Application:
Leveraging The Configurability of Cloud. In APSys. ACM, 2011.
• [Heshan’11] H. Lin, X. Ma, W. Feng, and N. Samatova. Coordinating
Computation and I/O in Massively Parallel Sequence Search. IEEE Transactions
on Parallel and Distributed Systems, 2011.
• [Shan’08] H. Shan, K. Antypas, and J. Shalf. Characterizing and Predicting the
I/O Performance of HPC Applications Using a Parameterized Synthetic
Benchmark. In SC. IEEE, 2008.
28SuperComputing 20136/28/2017

More Related Content

What's hot

Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
Vladimir Starostenkov
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
Milind Bhandarkar
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
Donald Miner
 
Introduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrepIntroduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrep
Paweł Mitruś
 
MapReduce Over Lustre
MapReduce Over LustreMapReduce Over Lustre
MapReduce Over Lustre
David Luan
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
 
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
PingCAP
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
Ganesan Narayanasamy
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learningbutest
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
Shani729
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
inside-BigData.com
 
Indexed Hive
Indexed HiveIndexed Hive
Indexed Hive
NikhilDeshpande
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
Jim Dowling
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
Ganesan Narayanasamy
 

What's hot (20)

Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 
Introduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrepIntroduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrep
 
MapReduce Over Lustre
MapReduce Over LustreMapReduce Over Lustre
MapReduce Over Lustre
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learning
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Indexed Hive
Indexed HiveIndexed Hive
Indexed Hive
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 

Viewers also liked

Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Mingliang Liu
 
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...
muzaffertahir9
 
Tray-sign...doc
Tray-sign...docTray-sign...doc
Tray-sign...doc
Kalifahmain
 
Sistema Solar
Sistema SolarSistema Solar
Sistema Solar
Lidia Martínez Amador
 
01 presentación del taller sobre flipped learning
01 presentación del taller sobre flipped learning01 presentación del taller sobre flipped learning
01 presentación del taller sobre flipped learning
Alfredo Prieto Martín
 
Taller pedagogía y tecnología de la evaluación formativa adelantada2para enviar
Taller pedagogía y tecnología de la  evaluación formativa adelantada2para enviarTaller pedagogía y tecnología de la  evaluación formativa adelantada2para enviar
Taller pedagogía y tecnología de la evaluación formativa adelantada2para enviar
Alfredo Prieto Martín
 
Huerto frutal de citricos
Huerto frutal de citricosHuerto frutal de citricos
Huerto frutal de citricos
César González
 
History graphic design - print • photo • digital • 3 d
History   graphic design - print • photo • digital •  3 dHistory   graphic design - print • photo • digital •  3 d
History graphic design - print • photo • digital • 3 d
Productz
 
Impact of Digital on Supply Chain
Impact of Digital on Supply ChainImpact of Digital on Supply Chain
Impact of Digital on Supply Chain
Ayantan Sikdar, CSCP
 
Global Services Location Index 2016 | A.T. Kearney
Global Services Location Index 2016  | A.T. KearneyGlobal Services Location Index 2016  | A.T. Kearney
Global Services Location Index 2016 | A.T. Kearney
Kearney
 
Redux, Relay, HorizonあるいはElm
Redux, Relay, HorizonあるいはElmRedux, Relay, HorizonあるいはElm
Redux, Relay, HorizonあるいはElm
chuck h
 

Viewers also liked (11)

Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
 
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...
 
Tray-sign...doc
Tray-sign...docTray-sign...doc
Tray-sign...doc
 
Sistema Solar
Sistema SolarSistema Solar
Sistema Solar
 
01 presentación del taller sobre flipped learning
01 presentación del taller sobre flipped learning01 presentación del taller sobre flipped learning
01 presentación del taller sobre flipped learning
 
Taller pedagogía y tecnología de la evaluación formativa adelantada2para enviar
Taller pedagogía y tecnología de la  evaluación formativa adelantada2para enviarTaller pedagogía y tecnología de la  evaluación formativa adelantada2para enviar
Taller pedagogía y tecnología de la evaluación formativa adelantada2para enviar
 
Huerto frutal de citricos
Huerto frutal de citricosHuerto frutal de citricos
Huerto frutal de citricos
 
History graphic design - print • photo • digital • 3 d
History   graphic design - print • photo • digital •  3 dHistory   graphic design - print • photo • digital •  3 d
History graphic design - print • photo • digital • 3 d
 
Impact of Digital on Supply Chain
Impact of Digital on Supply ChainImpact of Digital on Supply Chain
Impact of Digital on Supply Chain
 
Global Services Location Index 2016 | A.T. Kearney
Global Services Location Index 2016  | A.T. KearneyGlobal Services Location Index 2016  | A.T. Kearney
Global Services Location Index 2016 | A.T. Kearney
 
Redux, Relay, HorizonあるいはElm
Redux, Relay, HorizonあるいはElmRedux, Relay, HorizonあるいはElm
Redux, Relay, HorizonあるいはElm
 

Similar to ACIC: Automatic Cloud I/O Configurator for HPC Applications

How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Indrajit Poddar
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
mbreternitz
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
DataWorks Summit/Hadoop Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
Amazon Web Services
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Redis Labs
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Yong Feng
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
Facultad de Informática UCM
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
Bill Liu
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
SnappyData
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
inside-BigData.com
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
huguk
 
Cache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulatorCache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulator
المهندسة عائشة بني صخر
 

Similar to ACIC: Automatic Cloud I/O Configurator for HPC Applications (20)

How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Cache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulatorCache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulator
 

Recently uploaded

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 

ACIC: Automatic Cloud I/O Configurator for HPC Applications

  • 1. ACIC: AUTOMATIC CLOUD I/O CONFIGURATOR FOR HPC APPLICATIONS Mingliang Liu*, Ye Jin^, Jidong Zhai*, Yan Zhai*, Qianqian Shi*, Xiaosong Ma^, Wenguang Chen* *Tsinghua University ^North Carolina State University 1SuperComputing 20136/28/2017
  • 2. Background 2 • HPC in Cloud • Dedicated for high-end cloud computing in science • Trend to migrate HPC applications to cloud SuperComputing 20136/28/2017
  • 3. HPC in Cloud – pros and cons 3 • Local Clusters + Dedicated IB network + Run at physical machine - Fixed nodes types/numbers - Shared OS / file system / libraries - Gap between I/O and computation - Fixed device types/numbers - One-size-fits-all configuration - Per-platform configuring options • HPC in Cloud [Yan’11] - Shared 10Gb Ethernet - Virtualization overhead + Online instance acquisition + Fully controlled virtual machines - I/O overhead by virtualization + Multiple device/QoS choice + Application specific configuration + Shared cloud options by all users Key Idea: Help users find desired I/O system configurations SuperComputing 20136/28/2017
  • 4. Does I/O Configuration Matter? 4 • Configurations differ in performance and cost [Mingliang’11] • No single I/O system configuration beats all • Optimal configurations for performance and cost contradict SuperComputing 2013 BTIO application with 6 I/O configurations. The lower the better 6/28/2017
  • 5. Outline • Motivation • Challenges • Methodology • Evaluation • Conclusion 5SuperComputing 20136/28/2017
  • 6. 10 Gb Ethernet Compute Instances … NFS Server SuperComputing 2013 66/28/2017 PVFS Server EBS PVFS Server EBSEphemeral Ephemeral PVFS EBSPVFS EBS
  • 7. What Can We Configure? 7 File System File system internal parameters (Stripe Size: 64KB/4MB) File system (NFS vs. PVFS2) I/O Server I/O server number (1/2/4) I/O server placement (Dedicated vs. Part-time) Storage Device Software RAID (RAID 0 vs. No RAID) Device number (1/2) Cloud storage device type (EBS vs. Ephemeral vs. SSD) SuperComputing 20136/28/2017
  • 8. What Do Configurations Depend On? 8 Name Value Number of all processes {32, 64, 128, 256} Number of I/O processes {32, 64, 128, 256} I/O interface {POSIX, MPIIO} I/O iteration count {1, 10, 100} Data size {1, 4, 16, 32, 128, 512} MB Request size {256KB, 4MB,16MB, 128MB} Read and/or write {read, write} Collective {yes, no} File sharing {share, individual} SuperComputing 20136/28/2017 • Target (performance, or cost) • Workload I/O Characteristics
  • 9. How to Configure Optimally? • Configure I/O system by hand [Heshan’11] 9SuperComputing 2013 • Try all configurations for one application • Configuration burden to scientific users • Time- and money-consuming 6/28/2017 Hard Expensive • Obvious gaps between manual configurations and optimal ones
  • 10. Our Approach • Automatically predict and select optimal I/O configurations • Map workload I/O characteristics to configurations 10SuperComputing 20136/28/2017 I/O System Configuration Options Name Value Disk device {EBS, ephemeral} File system {NFS, PVFS2} Instance type {cc1.4xlarge, cc2.8xlarge} I/O server number {1, 2, 4} Placement {part-time, dedicated} Stripe size {64KB, 4MB} Workload I/O Characteristics Name Value Number of all processes {32, 64, 128, 256} Number of I/O processes {32, 64, 128, 256} I/O interface {POSIX, MPIIO} I/O iteration count {1, 10, 100} Data size {1, 4, 16, 32, 128, 512} MB Request size {256KB, 4MB,16MB, 128MB} Read and/or write {read, write} Collective {yes, no} File sharing {share, individual} 15 dimension > 1M
  • 11. Outline • Motivation • Challenges • Methodology • Evaluation • Conclusion 11SuperComputing 20136/28/2017
  • 12. 15-Dimension Exploration Space Dimension Reducer Prediction Model (CART) Reduced Exploration Sets Application’s IO Characteristics Query Result Recommended I/O Configuration Target Cloud I/O Characteristic I/O Configuration Run Configure Training Database Input Train Insert Query Conditions IOR Overview 12SuperComputing 20136/28/2017 ACIC Cloud System I/O Configuration Application I/O Characteristic
  • 13. Dimension Reducer • Identify relative importance of parameters (PB Matrix [Plackett’46]) 13 Row Parameters Perf. Value A B C D E 1 1 1 1 -1 1 19 2 -1 1 1 1 -1 21 3 -1 -1 1 1 1 2 4 1 -1 -1 1 1 11 5 -1 1 -1 -1 1 72 6 1 -1 1 -1 -1 100 7 1 1 -1 1 -1 8 8 -1 -1 -1 -1 -1 3 Effect Value 40 4 48 152 28 Rank 3 5 2 1 4 Sample PB design working with N = 5 and N’ = 8 SuperComputing 20136/28/2017 48 [4, 100]
  • 14. Parameter Ranks 14 Rank Name Value 1 Data size {1, 4, 16, 32, 128, 512} MB 2 Read and/or write {read, write} 3 I/O server number {1, 2, 4} 4 Number of I/O processes {32, 64, 128, 256} 5 File system {NFS, PVFS2} 6 Stripe size {64KB, 4MB} 7 Placement {part-time, dedicated} 8 Request size {256KB, 4MB,16MB, 128MB} 9 I/O interface {POSIX, MPIIO} 10 Disk device {EBS, ephemeral} 11 Collective {yes, no} 12 Instance type {cc1.4xlarge, cc2.8xlarge} 13 I/O iteration count {1, 10, 100} 14 Number of all processes {32, 64, 128, 256} 15 File sharing {share, individual} SuperComputing 20136/28/2017
  • 15. 15-Dimension Exploration Space Dimension Reducer Prediction Model (CART) Reduced Exploration Sets Application’s IO Characteristics Query Result Recommended I/O Configuration Target Cloud I/O Characteristic I/O Configuration Run Configure Training Database Input Train Insert Query Conditions IOR Overview 15SuperComputing 20136/28/2017 Target HPC Application IO Profiler ACIC [Olshen’84] [shan’08] Crowd-Sourcing
  • 16. CART Example 16 … STD = 0.147 AVG = 1.9 FILE SYSTEM REQUEST <= 34MB STD = 0.069 AVG = 2.2 DATA_SIZE PVFS2 STD = 0.202 AVG = 1.3 DATA_SIZE NFS STD = 0.041 AVG = 2.1 DEVICE <= 24MB STD = 0.014 AVG = 0.8 > 24576 KB STD = 0.03 AVG = 1.6 <= 24576 KB STD = 0.066 AVG = 2.4 > 24MB STD = 0.006 AVG = 2.2 ephemeral STD = 0.001 AVG = 2.0 EBS … SuperComputing 20136/28/2017 (…,request_size = 4MB, data_size = 16MB, …, file_system = PVFS2)
  • 17. 15-Dimension Exploration Space Dimension Reducer Prediction Model (CART) Reduced Exploration Sets Application’s IO Characteristics Query Result Recommended I/O Configuration Target Cloud I/O Characteristic I/O Configuration Run Configure Training Database Input Train Insert Query Conditions IOR Overview 17SuperComputing 20136/28/2017 ACIC
  • 18. Outline • Motivation • Challenges • Methodology • Evaluation • Experiment Setup • Effectiveness • Training Cost • Conclusion 18SuperComputing 20136/28/2017
  • 19. Evaluation - Platform • Amazon Cluster Computing Instance • 2 * 8-core Intel Xeon CPU, 60.5GB RAM • 10 Gigabit Ethernet • Amazon Linux AMI, Intel compiler & MPI runtime • Storage Device • Ephemeral • EBS (Elastic Block Store) • Software RAID • Baseline Configuration • NFS, dedicated, 1 EBS device 19SuperComputing 20136/28/2017
  • 20. Name Domain CPU Network Read/Write API BTIO Physics High High Write MPIIO FLASHIO Astrophysics Low Low Write MPIIO mpiBLAST Biology Medium Medium Read POSIX MADbench2 Cosmology Low Medium Read & Write MPIIO • Selected HPC Workloads Evaluation - Applications 20SuperComputing 20136/28/2017
  • 21. App. Proc. Device P/D FS IO Servers Strip Size BTIO 64 EBS P NFS 1 N/A 256 eph. P PVFS2 4 4MB FLASHIO 64 eph. D NFS 1 N/A 256 eph. P NFS 1 N/A mpiBLAST 32 eph. P PVFS2 4 64KB 64 eph. D PVFS2 4 4MB 128 eph. D PVFS2 4 4MB MADbenc h2 64 eph. D PVFS2 4 4MB 256 EBS D PVFS2 4 4MB • Optimal Performance Configurations 7/9: It’s difficult to guess optimal one even within the 5-D space. Evaluation - No One Excels All 21SuperComputing 20136/28/2017 9 test cases 7 unique configs Guess?
  • 22. Effectiveness of Exec. Time Optimization 22 Median ACIC Baseline SuperComputing 20136/28/2017 • Large performance range under different configurations • Near optimal configurations predicted by ACIC
  • 23. Effectiveness of Total Cost Saving 23SuperComputing 20136/28/2017 • Even better results in total cost saving by ACIC
  • 24. Training More Data 6/28/2017 SuperComputing 2013 24 (a) Execution time (over baseline) Figure 7: Accuracy enhancement from examining top-k 0 20 40 60 80 100 7 8 9 10 11 12 13 14 15 0.1 1 10 100 1000 Costsavingunderbaseline(%) Trainingcost(K$) Number of model papameters Training cost BTIO-64 FLASHIO-256 mpiBLAST-128 MADbench2-256 Figure 8: Impact on prediction performance using di↵erent numbers of top ranking model parameters cost of only timization e (by collectin appears to the estimate exponential exploring th straints, we dimensions, will bring si 5.5 Com 20% 40% 60% 80% 100%• More training data points, higher prediction accuracy • The gain is heavily application-dependent • Training cost increases exponentially 1000$ × c 100,000$ × c
  • 25. Outline • Motivation • Challenges • Methodology • Evaluation • Conclusion 25SuperComputing 20136/28/2017
  • 26. Conclusion • I/O configurations is crucial to HPC in cloud • Manual configuration is error-prone even for experts • Automatic I/O configurator is helpful • Building a prediction model is challenging • Reduce high dimensional space to sample training data • Reuse training data in crowd-sourcing way to amortize cost 26SuperComputing 20136/28/2017
  • 27. 27 http://hpc.cs.tsinghua.edu.cn/ACIC SuperComputing 2013 • Thank Heshan Lin and Ruini Xue for joining user study • Thank anonymous reviewers for their useful comments • Supported in China: 863 NO.2012AA01A302, NSFC 61133006 and 61103021 • Supported in U.S.: NSF awards (CNS-0546301, CNS-0915861, and CCF-0937908) 6/28/2017
  • 28. References • [Yan’11] Y. Zhai, M. Liu, J. Zhai, X. Ma, and W. Chen. Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications. In SC. ACM, 2011. • [Plackett’46] R. Plackett and J. Burman. The Design of Optimum Multifactorial Experiments. Biometrika, 1946. • [Olshen’84] L. Olshen and C. Stone. Classication and Regression Trees. Wadsworth International Group, 1984. • [Mesnier’07] M. Mesnier, M. Wachs, R. Sambasivan, A. Zheng, and G. Ganger. Modeling the Relative Fitness of Storage. In SIGMETRICS. ACM, 2007. • [Mingliang’11] Mingliang Liu and Jidong Zhai and Yan Zhai and Xiaosong Ma and Wenguang Chen. One Optimized I/O Configuration per HPC Application: Leveraging The Configurability of Cloud. In APSys. ACM, 2011. • [Heshan’11] H. Lin, X. Ma, W. Feng, and N. Samatova. Coordinating Computation and I/O in Massively Parallel Sequence Search. IEEE Transactions on Parallel and Distributed Systems, 2011. • [Shan’08] H. Shan, K. Antypas, and J. Shalf. Characterizing and Predicting the I/O Performance of HPC Applications Using a Parameterized Synthetic Benchmark. In SC. IEEE, 2008. 28SuperComputing 20136/28/2017

Editor's Notes

  1. As the cloud computing becomes increasingly popular, cloud providers begin to support dedicated instances for high-end cloud computing in science. Thus there is a trend that HPC users are migrating their applications from traditional HPC resources to cloud。
  2. But the HPC in cloud has not grabbed everyone’s mind. We compared the cloud platform with the local clusters and list the pros and cons. There are disadvantages of HPC cloud such as the shared 10 Gb Ethernet and virtualization overhead. While, there are advantages as well. For example, local clusters have fixed types and numbers of nodes but the we can acquire more instances online and it charges us in a pay-as-you-go approach. However, there I/O gap in the local clusters is enlarged in cloud. Fortunately, there are some further potentials which may make the cloud more competitive. For example, cloud provides multiple device/instance/QoS choices. We can configure the cloud according to our application’s needs. As to the configuration options, they are shared by all users of the same cloud, which makes it possible to reuse the configuration efforts and amortize the cost.
  3. A question is proposed before we move on: does I/O configuration matter? Here is our preliminary results. We run the BT-IO of NPB benchmark with 6 I/O configurations varying File system type (PVFS2 vs. NFS) I/O servers (1, 2, or 4) numbers And their placement strategy (dedicated vs. part-time) Each line in the above figures indicates the result of one configuration. The y axis is the total execution time or the cost of one run. The x axis is the number of processes. We can see from the figures that: 1, 2, 3.
  4. Here is the out line of this talk. After introduced the motivation, we define the problem and then propose our tool to address it. We will show some interesting results and conclude it briefly.
  5. This figure shows the configuration stack of Amazon EC2 platform. There are three categories, the first one is the storage device configurations, the second is the file system and server configuration and the third are the internal parameters. We also listed the sample values of the configuration options.
  6. Well, among all these configurations, what’s the optimal one? Obviously, it depends on our application’s I/O needs and our target. The target can be minimizing overall execution time, or saving the total cost. This table lists the important application I/O characteristics we should consider, in order to find the optimal configurations.
  7. Confident users may try to do this by hand. We invited an experienced user and a developer to configure the I/O system for the mpiBLAST application from 32 configuration candidates. And compared the total run time and cost of their configurations with the result of the optimal one. The black bars are the performance improvement of user selected configurations, the dotted bars are the performance improvement of developer selected configurations, and the white bars are the optimal ones of all the candidates. Conservative users would like to try all configurations for their applications and select the optimal one for the future runs. For example, performance variance should be considered for one trial
  8. Here is the out line of this talk. After introduced the motivation, we define the problem and then propose our tool to address it. We will show some interesting results and conclude it briefly.
  9. To sample the training data from the exploration space, we need a smarter way than choosing randomly. We later realized that the parameters differ from each other in importance. So it’s straightforward to reduce the exploration space by choosing the top parameters and train all their sampled combinations to bootstrap, then we can add more parameters incrementally
  10. We use a magical technique called PB matrix to evaluate the importance of the parameters so that we can select the most important parameters from the huge exploration space. PB matrix was proposed for agricultural crops experiment design and quality control in manufacturing, it’s able to evaluate the parameters’ importance using a few of experiment trials. Because in the cloud computing, it takes time and money to run the trials. There are five parameters in this example table, ABCDE. We use the recipe PB matrix and there are 8 rows for this sample. For each run, the value for each parameter is set according to one row of the PB Matrix, whose elements are assigned with binary values (either “+1” or “-1”) based on pre-specified PB design rules. For example, in the first row, we use high value for all parameters except D, which will use the low value. The “high” and “low” values are selected to be at the two ends of the parameter value range. After the runs are completed, the importance of each parameter is calculated as the dot product of the parameter and the result column. In this example, parameter D is considered most important and parameter B is considered least important.
  11. This table lists the rank of all the 15 parameters, as well as the sampled values. We use the top 10 parameters to bootstrap ACIC. More parameters can be added later using this rank as guidance..
  12. Run the synthetic benchmark named IOR Vary parameter to mimic different behaviors Set up I/O system with all configuration candidates Collect results for different targets: Performance, or Cost By continuous, crowd-sourcing training, the ACIC can effortlessly deal with cloud hardware/software upgrades with common data aging methods. Why CART? Obvious difference in importance of parameters Simple, flexible, and interpretable We can tolerate absolute prediction error as long as the rank of the configuration is close to the real one.
  13. Here is a CART example. There are two kinds of node in a decision tree, internal nodes and leaf nodes. Each internal node has a predictor to split the values into two sub-groups. The left child and the right child. To build this tree, every internal node is split if the variance of the values is large enough. Each leaf node has the final sub-group value, indicating by the AVG field. For each input, we can get the final value by traversing from the root node to this node.
  14. Now we introduced the three parts of ACIC.
  15. Let’s see some interesting data.
  16. We choose this baseline because it’s simple and popular.
  17. Selected application differ from each other in many characteristics including the scientific area, CPU/network usage, read/write and APIs.
  18. There are 9 cases composed by application as well as the scale.
  19. We exhaustively tested all candidate configurations sampled before by running the 4 applications at different scales. The total run time with each configuration is indicated by a gray dot. The vertical span of gray dots depicts the range of measured total execution time for the entire configuration space. The lowest dot in each figure is the measured optimal configuration. The black points highlight the total run time under the ACIC recommended I/O configuration. The solid red line marks the median performance among all configuration candidates, while the dashed black line marks the performance of the baseline (B) I/O configuration. Speedup ratios achieved by ACIC over the median and baseline are shown at the top of each figure. First, these figures clearly demonstrate the potentially large difference, in overall execution time caused by different I/O system configurations Second, ACIC is able to identify near-optimal I/O configurations in almost all situations, as the black points are located near the bottom of the gray “spectrum”.
  20. The cost is calculated by the execution time, the number of instances and the price per instance per hour.
  21. This figures presents the results of parameter sensitivity using four sample runs, one for each application. The x axis indicates the number of top ranking parameters used in model training as ordered by PB matrix. For each parameter count, the y axis on the left measures the performance of the ACIC top recommendation in terms of cost saving over the baseline, while the y axis on the right measures the cost of training data collection. When using 10 parameters, the total training data collection cost is around $1K
  22. Here is the out line of this talk. After introduced the motivation, we define the problem and then propose our tool to address it. We will show some interesting results and conclude it briefly.
  23. We published ACIC to the HPC community. Users can download the training database and build the CART model to predict the optimal I/O system configurations for their applications. New contributions are heavily welcome. Please scan this bar code and visit the homepage of ACIC. That’s all thank you!