HPC with Clouds and Cloud Technologies

Jaliya Ekanayake and Geoffrey Fox
School of Informatics and Computing
Indiana University Bloomington
Cloud Computing and Software Services: Theory and Techniques
July, 2010
Presented by:
Inderjeet Singh

 Introduction
 Problem
 Data Analysis Applications
 Evaluations and Analysis
 Performance of MPI on Clouds
 Benchmarks and Results
 Conclusions and Future Work
 Critique

 Apache Hadoop (OpenSource version of Google MapReduce)
 DryadLINQ (Microsoft API for Dryad)
 CGL-MapReduce (Iterative version of MapReduce)

Cloud technologies/Parallel Runtimes/Cloud Runtimes

 On demand provisioning of resources
 Customizable Virtual Machines (VM)
 Root privileges
 Provisioning is very fast (within minutes)
 You pay only for what you use
 Better resource utilization

Cloud Technologies
 Moving computation to data
 Better Quality of Service (QoS)
 Simple communication topologies
 Distributed file system (HDFS,GFS)

Most HPC applications are based upon MPI
 Many fine grained communication topologies
 Usage of fast network

Software framework to support distributed computing
on large datasets on cluster of computers

 Map step - The master node takes the input, partitions it
up into smaller sub-problems, and distributes them to
worker nodes. A worker node may do this again in
turn, leading to a multi-level tree structure. The worker
node processes the smaller problem, and passes the
answer back to its master node

 Reduce step - The master node collects the answers to all
the sub-problems and combines them in some way to
form the output or answer

Large data/compute intensive applications
Traditional approach
 Execution on Clusters/grid/supercomputers
 Moving both application and data to available
computational power
 Efficiency decreases with large datasets

Better approach
 Execution with Cloud technologies
 Moving computations to data to perform processing
 More data centric approach

Comparisons of features supported by different
cloud technologies and MPI

 What applications are best handled by cloud
technologies?
 What overheads do they introduce?
 Can traditional parallel runtimes such as MPI
be used in cloud?
 If so, what overheads do they have?

Types of Applications (Based upon
communication)

 Map only (Cap3)
 Map Reduce (HEP)
 Iterative/Complex style (Matrix Multiplication and
K-Means Clustering)

 Cap3 - Sequence assembly program that operates
on a collection of gene sequence files to produce
several outputs

 HEP - High Energy Physics data analysis application

 K-Means clustering - Performs iteratively refining
computation of clusters

 Matrix Multiplication – Cannon’s algorithm

 MapReduce does not support iterative/complex style
applications so [Fox] build CGL- MapReduce
 CGL-Mapreduce – Supports long running tasks and retains
static data in memory across invocations

 Performance (average running time)
 Overhead = [P * T(P) – T(1)]/T(1)
P = No. of processes

DryadLINQ

Hadoop/
CGL
MapReduce/M
PI

 CAP3 (map only) and HEP (mapreduce) perform well
with cloud runtimes
 K-means clustering (iterative) and matrix
multiplications (iterative) show high overheads with
cloud runtimes compared to MPI runtime
 CGL-Mapreduce also gives less overhead for large
datasets

Goals
 Overhead of Virtual Machines (VM) on parallel
applications in MPI
 How applications with different
communication/computation (c/c) ratio perform on
cloud?
 Effect of different CPU core assignment strategies
on VMs and running these MPI applications on
these VMs

Three MPI applications with different c/c
ratios requirements

 Matrix multiplication (Cannon’s algorithm)
 K-Means clustering
 Concurrent wave solver

Computation and Communication complexities of the
different MPI applications used

 Eucalyptus and Xen based cloud
infrastructure
 16 nodes with 2 Quad Core Intel Xeon processors and
32 GB of memory
 Nodes connected with 1 gigabit Ethernet connection
 Same s/w configuration for both bare-metal
nodes and VMs
• OS - Red Hat Enterprise Linux Server release 5.2
• OpenMP version 1.3.2

Different CPU core/virtual machines assignment strategies

Invariant to select the number of MPI processes
Number of MPI processes = Number of CPU cores used

Performance – 64 CPU Cores Speedup – Fixed Matrix size
(5184*5184)

◦ Speedup decrease 34% between Bare metal and 8-VM/node
at 81 processes
◦ Exchange of large messages and more communication

Performance – 128 CPU Cores Total overhead (Number of MPI
Processes =128)
◦ Communication is very less than computations
◦ Communication here depends upon number of clusters formed
◦ Overhead is large for small data sizes, so less speedup is
observed

Total Overhead (Number of MPI
Performance – 128 CPU Cores Processes = 128)

◦ Amount of communications is fixed, less data transfer rates
◦ Lower c/c ratio of O(1/n) leads to more latency and lower
performance on VMs
◦ 8-VMs per node has 7% more overhead than bare metal node

Communication between dom0 and domUs when 1-VM per node is deployed
(top). Communication between dom0 and domUs when 8-VMs per node are
deployed (bottom)

◦ In multi VMs configuration scheduling of I/O
operation of DomUs (user domains) happens via
Dom0 (privileged OS)

Figure: LAM vs. OpenMPI in different VM configurations

 When using mutliple VMs on multi-core CPUs, it is good to
use runtimes supporting in-node communications
(OpenMP vs LAM-MPI)

 Cloud runtimes work well for pleasing parallel (map
only and mapreduce) applications with large
datasets
 Overheads of cloud runtimes are high with parallel
applications that require iterative/complex
communication patterns (MPI based applications)
 Work needs to be done on finding algorithms for
these applications that are cloud friendly
 CGL-MapReduce is efficient for iterative style
mapreduce applications (k-means)

 Overheads for MPI applications increase as number
of VMs/node increase (22-50% degradation)
 In-node communication in important
 MapReduce applications (not susceptible to
latencies) may perform well on VMs deployed on
clouds
 Integration of MapReduce and MPI (biological DNA
sequencing application)

 No results of implementation of pleasing parallel
applications (Cap3, HEP) with MPI, missing MPI and
cloud runtimes time comparisons
 Missing evaluations of HPC applications
implemented with cloud runtimes on private
cloud, which is critical to show the effect of multi
VMs/multi-core configurations on performances of
these applications
 Difference in memory sizes (16/32 GB) for clusters
of different OS. This could lead to biased results

 Ekanayake Jaliya and Fox Geoffrey, High Performance Parallel
Computing with Clouds and Cloud Technologies, Lecture Notes of
the Institute for Computer Sciences, Social Informatics and
Telecommunications Engineering (2010), Pages 20, Volume 34

 High Performance Parallel Computing with Clouds and Cloud
Technologies. http://www.slideshare.net/jaliyae/high-performance-
parallel-computing-with-clouds-and-cloud-technologies

 Map Reduce, Wikipedia: http://en.wikipedia.org/wiki/MapReduce

HPC with Clouds and Cloud Technologies

HPC with Clouds and Cloud Technologies

More Related Content

What's hot

Viewers also liked

Similar to HPC with Clouds and Cloud Technologies

Recently uploaded

HPC with Clouds and Cloud Technologies