HPC with Clouds and Cloud Technologies


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HPC with Clouds and Cloud Technologies

  1. 1. Jaliya Ekanayake and Geoffrey Fox School of Informatics and Computing Indiana University BloomingtonCloud Computing and Software Services: Theory and Techniques July, 2010 Presented by: Inderjeet Singh
  2. 2.  Introduction Problem Data Analysis Applications Evaluations and Analysis Performance of MPI on Clouds Benchmarks and Results Conclusions and Future Work Critique
  3. 3.  Apache Hadoop (OpenSource version of Google MapReduce) DryadLINQ (Microsoft API for Dryad) CGL-MapReduce (Iterative version of MapReduce)Cloud technologies/Parallel Runtimes/Cloud Runtimes
  4. 4.  On demand provisioning of resources Customizable Virtual Machines (VM) Root privileges Provisioning is very fast (within minutes) You pay only for what you use Better resource utilization
  5. 5. Cloud Technologies Moving computation to data Better Quality of Service (QoS) Simple communication topologies Distributed file system (HDFS,GFS)Most HPC applications are based upon MPI Many fine grained communication topologies Usage of fast network
  6. 6. Software framework to support distributed computing on large datasets on cluster of computers Map step - The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node Reduce step - The master node collects the answers to all the sub-problems and combines them in some way to form the output or answer
  7. 7. Large data/compute intensive applicationsTraditional approach Execution on Clusters/grid/supercomputers Moving both application and data to available computational power Efficiency decreases with large datasetsBetter approach Execution with Cloud technologies Moving computations to data to perform processing More data centric approach
  8. 8. Comparisons of features supported by differentcloud technologies and MPI
  9. 9.  What applications are best handled by cloud technologies? What overheads do they introduce? Can traditional parallel runtimes such as MPI be used in cloud? If so, what overheads do they have?
  10. 10. Types of Applications (Based upon communication) Map only (Cap3) Map Reduce (HEP) Iterative/Complex style (Matrix Multiplication and K-Means Clustering)
  11. 11.  Cap3 - Sequence assembly program that operates on a collection of gene sequence files to produce several outputs HEP - High Energy Physics data analysis application K-Means clustering - Performs iteratively refining computation of clusters Matrix Multiplication – Cannon’s algorithm
  12. 12.  MapReduce does not support iterative/complex style applications so [Fox] build CGL- MapReduce CGL-Mapreduce – Supports long running tasks and retains static data in memory across invocations
  13. 13.  Performance (average running time) Overhead = [P * T(P) – T(1)]/T(1) P = No. of processes DryadLINQ Hadoop/ CGL MapReduce/M PI
  14. 14.  CAP3 (map only) and HEP (mapreduce) perform well with cloud runtimes K-means clustering (iterative) and matrix multiplications (iterative) show high overheads with cloud runtimes compared to MPI runtime CGL-Mapreduce also gives less overhead for large datasets
  15. 15. Goals Overhead of Virtual Machines (VM) on parallel applications in MPI How applications with different communication/computation (c/c) ratio perform on cloud? Effect of different CPU core assignment strategies on VMs and running these MPI applications on these VMs
  16. 16. Three MPI applications with different c/c ratios requirements Matrix multiplication (Cannon’s algorithm) K-Means clustering Concurrent wave solver
  17. 17. Computation and Communication complexities of thedifferent MPI applications used
  18. 18.  Eucalyptus and Xen based cloud infrastructure  16 nodes with 2 Quad Core Intel Xeon processors and 32 GB of memory  Nodes connected with 1 gigabit Ethernet connection Same s/w configuration for both bare-metal nodes and VMs • OS - Red Hat Enterprise Linux Server release 5.2 • OpenMP version 1.3.2
  19. 19. Different CPU core/virtual machines assignment strategiesInvariant to select the number of MPI processes Number of MPI processes = Number of CPU cores used
  20. 20. Performance – 64 CPU Cores Speedup – Fixed Matrix size (5184*5184) ◦ Speedup decrease 34% between Bare metal and 8-VM/node at 81 processes ◦ Exchange of large messages and more communication
  21. 21. Performance – 128 CPU Cores Total overhead (Number of MPI Processes =128) ◦ Communication is very less than computations ◦ Communication here depends upon number of clusters formed ◦ Overhead is large for small data sizes, so less speedup is observed
  22. 22. Total Overhead (Number of MPIPerformance – 128 CPU Cores Processes = 128) ◦ Amount of communications is fixed, less data transfer rates ◦ Lower c/c ratio of O(1/n) leads to more latency and lower performance on VMs ◦ 8-VMs per node has 7% more overhead than bare metal node
  23. 23. Communication between dom0 and domUs when 1-VM per node is deployed(top). Communication between dom0 and domUs when 8-VMs per node aredeployed (bottom)◦ In multi VMs configuration scheduling of I/O operation of DomUs (user domains) happens via Dom0 (privileged OS)
  24. 24. Figure: LAM vs. OpenMPI in different VM configurations When using mutliple VMs on multi-core CPUs, it is good to use runtimes supporting in-node communications (OpenMP vs LAM-MPI)
  25. 25.  Cloud runtimes work well for pleasing parallel (map only and mapreduce) applications with large datasets Overheads of cloud runtimes are high with parallel applications that require iterative/complex communication patterns (MPI based applications) Work needs to be done on finding algorithms for these applications that are cloud friendly CGL-MapReduce is efficient for iterative style mapreduce applications (k-means)
  26. 26.  Overheads for MPI applications increase as number of VMs/node increase (22-50% degradation) In-node communication in important MapReduce applications (not susceptible to latencies) may perform well on VMs deployed on clouds Integration of MapReduce and MPI (biological DNA sequencing application)
  27. 27.  No results of implementation of pleasing parallel applications (Cap3, HEP) with MPI, missing MPI and cloud runtimes time comparisons Missing evaluations of HPC applications implemented with cloud runtimes on private cloud, which is critical to show the effect of multi VMs/multi-core configurations on performances of these applications Difference in memory sizes (16/32 GB) for clusters of different OS. This could lead to biased results
  28. 28.  Ekanayake Jaliya and Fox Geoffrey, High Performance Parallel Computing with Clouds and Cloud Technologies, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (2010), Pages 20, Volume 34 High Performance Parallel Computing with Clouds and Cloud Technologies. http://www.slideshare.net/jaliyae/high-performance- parallel-computing-with-clouds-and-cloud-technologies Map Reduce, Wikipedia: http://en.wikipedia.org/wiki/MapReduce