SlideShare a Scribd company logo
1 of 17
PERFORMANCE ANALYSIS OF HIGH
    PERFORMANCE COMPUTING APPLICATIONS ON
    THE AMAZON WEB SERVICES CLOUD
    Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane
    Canon, Shreyas Cholia, Harvey J. Wasserman, Nicholas J. Wright
    Lawrence Berkeley National Lab




      Presentation by Abhishek Gupta,
1     CS 598 Cloud Computing
GOALS
 Examine the performance of existing cloud computing
  infrastructures and create a mechanism for their
  quantitative evaluation
 Build upon previous studies by using the NERSC
  benchmarking framework to evaluate the performance
  of real scientific workloads on EC2
 Under DOE Magellan project - evaluate the ability of
  cloud computing to meet DOE’s computing needs



                                                         2
CONTRIBUTIONS


 Broadest evaluation to date of application performance on
  virtualized cloud computing platforms
 Experiences with running on Amazon EC2 and the
  encountered performance and availability variations.
 Analysis of the impact of virtualization based on the
  communication characteristics of the application
 Impact of virtualization through a simple, well-
  documented aggregate measure that expresses the
  useful potential of the systems considered


                                                          3
METHODS - MACHINES

   Carver:
     Quad-core, dual-socket Linux / Nehalem / QDR IB
      cluster
     Medium-sized cluster for jobs scaling to hundreds of
      processors; 3,200 total cores
   Franklin:
     Cray XT4
     Linux environment / Quad-core, AMD Opteron / Seastar
      interconnect, Lustre parallel filesystem
     Integrated HPC system for jobs scaling to tens of
      thousands of processors; 38,640 total cores            4
METHODS - MACHINES

   Lawrencium
     Quad-core, dual-socket Linux / Harpertown / DDR IB
      cluster
     Designed for jobs scaling to tens-hundreds of
      processors; 1,584 total cores
   Amazon EC2
     m1.large instance type: four EC2 Compute Units, two
      virtual cores with two EC2 Compute Units each, and 7.5
      GB of memory
     Heterogeneous processor types
                                                               5
METHODS – VIRTUAL CLUSTER
ARCHITECTURE ON EC2




                            6
METHODS – APPLICATIONS AND BENCHMARKS
USED

   High Performance Computing Challenge (HPCC)
    benchmark suite
     Consists of seven synthetic benchmarks
     Targeted synthetics : DGEMM, STREAM, and two measures
      of network latency and bandwidth.
     Complex synthetics :HPL, FFTE, PTRANS, and
      RandomAccess.
   NERSC 6 Benchmarks
     Set of applications representative of the NERSC workload
     Covers the science domains, parallelization schemes, and
      concurrencies, as well as machine-based characteristics that
      influence performance such as message size, memory           7
      access pattern, and working set sizes
METHODS – NERSC APPLICATIONS
   CAM: The Community Atmospheric Model
     Lower computational intensity
     Large point-to-point & collective MPI messages

   GAMESS: General Atomic and Molecular
    Electronic Structure System
     Memory access
     No collectives, very little communication

   GTC: GyrokineticTurbulence Code
     High computational intensity
     Bandwidth-bound nearest-neighbor communication plus
      collectives with small data payload
                                                            8
METHODS – NERSC APPLICATIONS
   IMPACT-T: Integrated Map and Particle Accelerator Tracking
    Time
     Memory bandwidth & moderate computational intensity
     Collective performance with small to moderate message sizes

   MAESTRO: A Low Mach Number Stellar Hydrodynamics
    Code
     Low computational intensity
     Irregular communication patterns

   MILC: QCD
     High computation intensity
     Global communication with small messages
                                                                    9
   PARATEC: PARAllel Total Energy Code
       Global communication with small messages
RESULTS: HPCC PERFORMANCE




  64 cores
  Poor network performance on EC2




                                     10
RESULTS: APPLICATION PERFORMANCE




 Franklin and Lawrencium 1.4 to 2.6 slower than
  Carver.
 EC2
     •   Best case, GAMESS, EC2 is only 2.7 slower than Carver.
     •   Worst case, PARATEC, EC2 is more than 50 slower than Carver.
     •   Large performance spread caused by different demands of        11
         application on the network.
o   More detailed analysis required
RESULTS: PERFORMANCE ANALYSIS USING IPM




    Integrated Performance Monitoring (IPM) framework
      • Uses the MPI profiling interface
      • Examine the relative amounts of time taken by an application
        for computing and communicating, types of MPI calls made

                                                                 12
RESULTS: SUSTAINED SYSTEM PERFORMANCE




 SSP: aggregate measure of the workload-specific,
  delivered performance of a computing system
 For each code measure
     • FLOP counts on a reference system
     • Wall clock run time on various systems
                                                     13
     •   N chosen to be 3,200
   Problem sets drastically reduced
RESULTS: VARIABILITY




   Performance Variability across runs
     •   Non-homogeneous nature of the systems allocated
     •   Network sharing and contention
     •   Sharing the un-virtualized hardware

                                                           14
RESULTS: SCALING




                   15
CONCLUSIONS
 EC2 performance degrades significantly as
  applications spend more time communicating
 Applications with global, all-to-all. communication
  perform worse then those that mostly use point-to-
  point communication.
 Amount of variability in EC2 performance can be
  significant.




                                                        16
DISCUSSION QUESTIONS

 This paper focused on performance alone. What
  are the performance cost tradeoffs for different
  platforms?
 How does the above tradeoff differ with application
  characteristics such as granularity, communication
  sensitivity etc.?
 What is the primary source of performance
  variability on Amazon EC2?


                                                        17

More Related Content

What's hot

Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controlslylcheng88
 
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Eswar Publications
 
The Cloud Cube
The Cloud CubeThe Cloud Cube
The Cloud CubeAdrius42
 
Networking in cloud computing
Networking in cloud computingNetworking in cloud computing
Networking in cloud computingBarani Tharan
 
The Economic Benefits of Cloud Computing
The Economic Benefits of Cloud ComputingThe Economic Benefits of Cloud Computing
The Economic Benefits of Cloud ComputingSean Teague
 
Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerancegaurav jain
 
How Cloud Computing Works?
How Cloud Computing Works?How Cloud Computing Works?
How Cloud Computing Works?icloud9
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computingRkrishna Mishra
 
Network virtualization seminar report
Network virtualization seminar reportNetwork virtualization seminar report
Network virtualization seminar reportSKS
 
All about Clod computing
All about Clod computingAll about Clod computing
All about Clod computingakanksha9597
 
How cloud computing work
How cloud computing workHow cloud computing work
How cloud computing workicloud9
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
Cloud Computing & Cloud Architecture
Cloud Computing & Cloud ArchitectureCloud Computing & Cloud Architecture
Cloud Computing & Cloud Architecturenotnip
 

What's hot (19)

Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
 
Cloud computing architectures
Cloud computing architecturesCloud computing architectures
Cloud computing architectures
 
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
 
The Cloud Cube
The Cloud CubeThe Cloud Cube
The Cloud Cube
 
Networking in cloud computing
Networking in cloud computingNetworking in cloud computing
Networking in cloud computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
The Economic Benefits of Cloud Computing
The Economic Benefits of Cloud ComputingThe Economic Benefits of Cloud Computing
The Economic Benefits of Cloud Computing
 
Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerance
 
How Cloud Computing Works?
How Cloud Computing Works?How Cloud Computing Works?
How Cloud Computing Works?
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 
Network virtualization seminar report
Network virtualization seminar reportNetwork virtualization seminar report
Network virtualization seminar report
 
All about Clod computing
All about Clod computingAll about Clod computing
All about Clod computing
 
Scheduling in CCE
Scheduling in CCEScheduling in CCE
Scheduling in CCE
 
How cloud computing work
How cloud computing workHow cloud computing work
How cloud computing work
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud computing presentation(ppt)
Cloud  computing presentation(ppt)Cloud  computing presentation(ppt)
Cloud computing presentation(ppt)
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
Cloud Computing & Cloud Architecture
Cloud Computing & Cloud ArchitectureCloud Computing & Cloud Architecture
Cloud Computing & Cloud Architecture
 

Similar to Paper

[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...Matteo Ferroni
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluationGIORGOS STAMELOS
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesInderjeet Singh
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascaleinside-BigData.com
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Centerinside-BigData.com
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...NECST Lab @ Politecnico di Milano
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Papitha Velumani
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Aheadinside-BigData.com
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 

Similar to Paper (20)

Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascale
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Interconnect your future
Interconnect your futureInterconnect your future
Interconnect your future
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 

Recently uploaded

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 

Recently uploaded (20)

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 

Paper

  • 1. PERFORMANCE ANALYSIS OF HIGH PERFORMANCE COMPUTING APPLICATIONS ON THE AMAZON WEB SERVICES CLOUD Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, Harvey J. Wasserman, Nicholas J. Wright Lawrence Berkeley National Lab Presentation by Abhishek Gupta, 1 CS 598 Cloud Computing
  • 2. GOALS  Examine the performance of existing cloud computing infrastructures and create a mechanism for their quantitative evaluation  Build upon previous studies by using the NERSC benchmarking framework to evaluate the performance of real scientific workloads on EC2  Under DOE Magellan project - evaluate the ability of cloud computing to meet DOE’s computing needs 2
  • 3. CONTRIBUTIONS  Broadest evaluation to date of application performance on virtualized cloud computing platforms  Experiences with running on Amazon EC2 and the encountered performance and availability variations.  Analysis of the impact of virtualization based on the communication characteristics of the application  Impact of virtualization through a simple, well- documented aggregate measure that expresses the useful potential of the systems considered 3
  • 4. METHODS - MACHINES  Carver:  Quad-core, dual-socket Linux / Nehalem / QDR IB cluster  Medium-sized cluster for jobs scaling to hundreds of processors; 3,200 total cores  Franklin:  Cray XT4  Linux environment / Quad-core, AMD Opteron / Seastar interconnect, Lustre parallel filesystem  Integrated HPC system for jobs scaling to tens of thousands of processors; 38,640 total cores 4
  • 5. METHODS - MACHINES  Lawrencium  Quad-core, dual-socket Linux / Harpertown / DDR IB cluster  Designed for jobs scaling to tens-hundreds of processors; 1,584 total cores  Amazon EC2  m1.large instance type: four EC2 Compute Units, two virtual cores with two EC2 Compute Units each, and 7.5 GB of memory  Heterogeneous processor types 5
  • 6. METHODS – VIRTUAL CLUSTER ARCHITECTURE ON EC2 6
  • 7. METHODS – APPLICATIONS AND BENCHMARKS USED  High Performance Computing Challenge (HPCC) benchmark suite  Consists of seven synthetic benchmarks  Targeted synthetics : DGEMM, STREAM, and two measures of network latency and bandwidth.  Complex synthetics :HPL, FFTE, PTRANS, and RandomAccess.  NERSC 6 Benchmarks  Set of applications representative of the NERSC workload  Covers the science domains, parallelization schemes, and concurrencies, as well as machine-based characteristics that influence performance such as message size, memory 7 access pattern, and working set sizes
  • 8. METHODS – NERSC APPLICATIONS  CAM: The Community Atmospheric Model  Lower computational intensity  Large point-to-point & collective MPI messages  GAMESS: General Atomic and Molecular Electronic Structure System  Memory access  No collectives, very little communication  GTC: GyrokineticTurbulence Code  High computational intensity  Bandwidth-bound nearest-neighbor communication plus collectives with small data payload 8
  • 9. METHODS – NERSC APPLICATIONS  IMPACT-T: Integrated Map and Particle Accelerator Tracking Time  Memory bandwidth & moderate computational intensity  Collective performance with small to moderate message sizes  MAESTRO: A Low Mach Number Stellar Hydrodynamics Code  Low computational intensity  Irregular communication patterns  MILC: QCD  High computation intensity  Global communication with small messages 9  PARATEC: PARAllel Total Energy Code  Global communication with small messages
  • 10. RESULTS: HPCC PERFORMANCE  64 cores  Poor network performance on EC2 10
  • 11. RESULTS: APPLICATION PERFORMANCE  Franklin and Lawrencium 1.4 to 2.6 slower than Carver.  EC2 • Best case, GAMESS, EC2 is only 2.7 slower than Carver. • Worst case, PARATEC, EC2 is more than 50 slower than Carver. • Large performance spread caused by different demands of 11 application on the network. o More detailed analysis required
  • 12. RESULTS: PERFORMANCE ANALYSIS USING IPM  Integrated Performance Monitoring (IPM) framework • Uses the MPI profiling interface • Examine the relative amounts of time taken by an application for computing and communicating, types of MPI calls made 12
  • 13. RESULTS: SUSTAINED SYSTEM PERFORMANCE  SSP: aggregate measure of the workload-specific, delivered performance of a computing system  For each code measure • FLOP counts on a reference system • Wall clock run time on various systems 13 • N chosen to be 3,200  Problem sets drastically reduced
  • 14. RESULTS: VARIABILITY  Performance Variability across runs • Non-homogeneous nature of the systems allocated • Network sharing and contention • Sharing the un-virtualized hardware 14
  • 16. CONCLUSIONS  EC2 performance degrades significantly as applications spend more time communicating  Applications with global, all-to-all. communication perform worse then those that mostly use point-to- point communication.  Amount of variability in EC2 performance can be significant. 16
  • 17. DISCUSSION QUESTIONS  This paper focused on performance alone. What are the performance cost tradeoffs for different platforms?  How does the above tradeoff differ with application characteristics such as granularity, communication sensitivity etc.?  What is the primary source of performance variability on Amazon EC2? 17

Editor's Notes

  1. It has quad-core Intel Nehalem processorsrunning at 2.67 GHz, with dual socket nodes and a singleQuad Data Rate (QDR) IB link per node to a network that islocally a fat-tree with a global 2D-mesh.Each XT4 compute node containsa single quad-core 2.3 GHz AMD Opteron ”Budapest” processor,which is tightly integrated to the XT4 interconnectvia a Cray SeaStar-2 ASIC through a 6.4 GB/s bidirectionalHyperTransport interface.
  2. Each compute node is a Dell Poweredge 1950 server equippedwith two Intel Xeon quad-core 64 bit, 2.66GHz Harpertownprocessors, connected to a Dual Data Rate (DDR) Infinibandnetwork configured as a fat treeAmazon EC2: is a virtual computing environment thatprovides a web services API for launching and managingvirtual machine instances. Amazon provides a number of differentinstance types that have varying performance characteristics.CPU capacity is defined in terms of an abstract AmazonEC2 Compute Unit. One EC2 Compute Unit is approximatelyequivalent to a 1.0-1.2 GHz 2007 Opteron or 2007 Xeonprocessor. For our tests we used the m1.large instances type.The m1.large instance type has four EC2 Compute Units, twovirtual cores with two EC2 Compute Units each, and 7.5 GBof memory. The nodes are connected with gigabit ethernet.
  3. major differences between the Amazon Web Services environmentand that at a typical supercomputing center. For example,almost all HPC applications assume the presence of a sharedparallel filesystem between compute nodes, and a head nodethat can submit MPI jobs to all of the worker nodesThe head node couldsubmit MPI jobs to all of the worker nodes, and the file serverprovided a shared filesystem between the nodes
  4. Targeted: These are microkernelswhich quantify basic system parameters that separatelycharacterize computation and communication performance.Proxy apps
  5. P2p vs all-to-allCommvscompuvs memorySmallmsgvs large msg
  6. The DGEMMresults are as one would expect based on the properties of theCPUs. The STREAM results show that EC2 is significantlyfaster for this benchmark than Lawrencium. We believe this isbecause of the particular processor distribution we received forour EC2 nodes for this testThe network latency and bandwidth results clearly show thedifference between the interconnects on the tested systemsThe ping-pong results show the latency andthe bandwidth with no self-induced contention, while therandomly ordered ring tests show the performance degradationwith self-contention. The uncontended latency and bandwidthmeasurements of the EC2 gigabit ethernet interconnect aremore than 20 times worse than the slowest other machine.However,for EC2 the less capable network clearly inhibits overall HPLperformance, by a factor of six or more. The FFTE benchmarkmeasures the floating point rate of execution of a doubleprecision complex one-dimensional discrete Fourier transform,and the PTRANS benchmark measures the time to transpose alarge matrix. Both of these benchmarks performance dependsupon the memory and network bandwidth and therefore showsimilar trends. EC2 is approximately 20 times slower thanCarver and four times slower than Lawrencium in both cases.The RandomAccess benchmark measures the rate of randomupdates of memory and its performance depends on memoryand network latency. In this case EC2 is approximately 10times slower than Carver and three times slower than Lawrencium.
  7. GAMESS (2.7), for this benchmark problem,places relatively little demand upon the network, and thereforeis hardly slowed down at all on EC2.PARATECshows the worst performance on EC2, 52 slower thanCarver. It performs 3-DFFT’s, and the global (i.e., all-toall)data transposes within these FFT operations can incur alarge communications overheadQualitatively, it seems that those applications that performthe most collective communication with the most messages arethose that perform the worst on EC2.
  8. relative runtime on EC2 compared to Lawrencium plottedagainst the percentage communication for each applicationas measured on Lawrencium. The overall trend is clear:the greater the fraction of its runtime an application spendscommunicating, the worse the performance is on EC2To determine these characteristics we classifiedthe MPI calls of the applications into 4 categories: smalland large messages (latency vs bandwidth limited) and pointto-point vs collective. (Note for the purposes of this work weclassified all messages < 4KB to be latency bound. The overallconclusions shown here contain no significant dependenceon this choice.) From this analysis it is clear why fvCAMbehaves anomalously; it is the only one of the applications thatperforms most of its communication via large messages, bothpoint-to-point and collectives