High Performance
   Cloud Computing



                                 Deepak Singh
         P r i n c i p a l   P r o d u c t   M a n a g e r
Via butteryflysha under a CC-BY license
Image: Simon Cockell under CC-BY
High Scale Computing
using a large number of
computers at the same
time to solve a problem
2
1


    High Throughput
       Computing
scale out
“embarassingly
   parallel”
constraints
constrained by
   capacity
More molecules

    Bigger systems



constrained by
   capacity
    More simulations

    More dimensions
constrained by time
Upcoming conference

       Grant submissions




constrained by time
          Impatience!

     Exploratory “spike” run
EC2
EC2
Elastic Compute Cloud
elastic
programmatic
ec2-run-instances
AWS CloudFormation
EC2 instance types
s
             pe
           ty
           ce
        an


            standard “m1”
     st
  in
  2
EC




              high cpu “c1”
           high memory “m2”


                http://aws.amazon.com/ec2/instance-types/
s
             pe
           ty
           ce
        an


            standard “m1”
     st
  in
  2
EC




              high cpu “c1”
           high memory “m2”


                http://aws.amazon.com/ec2/instance-types/
ec2-terminate-instances
rapid provisioning
10K in 45 minutes
design patterns
optimize for
throughput
Tasks




Instances
Tasks



Queue




Instances
Tasks



Queue




Instances
vertical scaling
Tasks



            Queue




            Instances


 Increase
instance
   size
Tasks



            Queue




            Instances


 Increase
instance
   size
horizontal scaling
Tasks



            Queue




            Instances




 Increase
instance
  count
Tasks



Queue




Instances




Results



Store
Tasks



Queue



On-premise


Instances


Results



Store
Tasks



Queue



On-premise


Instances


Results



Store
Tasks



Queue



On-premise


Instances


Results



Store
optimize for cost
optimize for cost
 maximize bang for buck
on-demand instances
reserved instances
spot instances
ideal for batch
persistent requests
all or nothing
use cases galore
Credit: Angel Pizzaro, U. Penn
2


     Cluster
    Computing
tightly coupled
MPI
Dua  l Intel          23GB RA
  X 5570        GPGPU
                              M
“Neha   lem”

                               HVM
    1.7TB
   scratch

             Cluster Compute
10 gig E




           Cluster Compute
Placement
  Group
Placement
                    group




Cluster Compute
231
450
Cores      7040
R   max
           41.82
R   peak
           82.51
GPGPU
2 x Tesla
   M2050
Getting Started
http://aws.amazon.com/hpc
4 steps
15 minutes
http://aws.amazon.com/ec2
performance
WIEN2K Parallel
                                                                    Performance

                                                                          H size 56,000 (25GB)
                                                                     Runtime (16x8 processors)
                                                                        Local (Infiniband) 3h:48
                                                                   Cloud (10Gbps) 1h:30 ($40)




                    1200 atom unit cell; SCALAPACK+MPI
                    diagonalization, matrix size 50k-100k

Credit: K. Jorissen, F. D. Villa, and J. J. Rehr (U. Washington)
customer examples
Example Use Case #1

Computational Fluid Dynamics

      Dynamic Clusters

   40-180 CC1 instances
Example Use Case #2

    Molecular Dynamics

       Steady Usage

   32-40 CG1 instances
Example Use Case #3

     Machine Learning

    Spiky, Experimental

    8-20 CG1 instances
Customer Case Study: Bioproximity




          http://aws.amazon.com/solutions/case-studies/bioproximity/
Customer Case Study: cyclopic energy




                           OpenFOAM®


         http://aws.amazon.com/solutions/case-studies/cyclopic-energy/
Customer Case Study: PSR
                  Stochastic Dual Dynamic Programming




 44,000 CPU hrs in Oct 2010
             http://aws.amazon.com/solutions/case-studies/psr/
familiar tools
Oracle Grid Engine
MIT StarCluster
LSF
Moab/Torque
Condor
StackIQ Rocks+
Slurm
deesingh@amazon.com
                                                             Twitter:@mndoci
                                               http://slideshare.net/mndoci
                                                          http://mndoci.com




        Inspiration and ideas from
        Matt Wood, James Hamilton
               & Larry Lessig

Credit” Oberazzi under a CC-BY-NC-SA license

High Performance Cloud Computing