Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adam Boeglin, HPC Solutions Architect
Monday, Oc...
Webinar Highlights
• What is CfnCluster and when to use it
• Architecture guidance to fit your
security models
• How to in...
Introduction to CfnCluster
• AWS CloudFormation + Cluster = CfnCluster
• Simple to install, easy to manage
• Everything yo...
Workloads Well Suited for CfnCluster
• Computational Fluid Dynamics
• Semiconductor Design
• Weather Modeling
• Genomics a...
Cluster HPC and Grid HPC
Cluster HPC
Tightly coupled,
latency sensitive
applications
Use larger EC2
compute instances,
pla...
Computational Fluid Dynamics
ANSYS Fluent
• AWS c4.8xlarge
• 140M cells
• F1 car CFD benchmark
http://www.ansys-blog.com/s...
https://aws.amazon.com/hpc/cfncluster/
Configuration Options
• Operating System
• Amazon Linux
• Centos 6
• Centos 7
• Ubuntu 14.04
• Scheduler
• Sun Grid Engine...
Many AWS services to tie it all together
• CloudFormation manages the state of the cluster
• Amazon CloudWatch & Auto Scal...
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
region-1a
Master Server
Auto Scaling
Compute Fleet
CloudFo...
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
Private Subnet
Master Server
Auto Scaling
Compute Fleet
Cl...
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
Private Subnet
Master Server
Auto Scaling
Compute Fleet
Cl...
Private Subnet
Master Server
Auto Scaling
Compute Fleet
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
CloudFormation
Corporate ...
Creating an IAM User
• Create an IAM user with Administrative privileges
• Fine grain access controls can be done later
• ...
Create an SSH Key
• Generate or import the key you’ll use for user login
Installing the CfnCluster CLI
• On your desktop or a bastion server
$ sudo pip install cfncluster
Creating the Base Configuration
• First, create the base
config required to
start a cluster.
$ cfncluster configure
Edit the configuration file to meet your needs
• Reference the configuration docs
• http://cfncluster.readthedocs.io/en/la...
Launch the Cluster
$ cfncluster create mycluster
• Cluster creation usually
takes ~15 minutes
• Completely managed by
Clou...
Submit your first job
[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -pe mpi 2
#$ -S /bin/bash
#
...
EBS Snapshots for Software & Storage Management
• Install your applications and
store any working data to
/shared
• Create...
Upgrading Hardware is Easy!
• Simple upgrade from Ivy Bridge to Haswell
1. Let all compute nodes stop
2. Edit ~/.cfncluste...
Demo: Launching a Cluster
Thank you!
Upcoming SlideShare
Loading in …5
×

Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster

2,895 views

Published on

This webinar will provide an overview of the AWS High Performance Computing (HPC) tool CfnCluster. We will cover the basics of what CfnCluster is and how it can help with the migration of traditional HPC applications to the cloud. The webinar will also provide guidance on how to install and configure CfnCluster in a way that will allow you to scale to thousands of cores in just a few minutes on AWS.

Published in: Technology
  • Be the first to comment

Launch a Thousand Core HPC Cluster in Minutes with AWS CfnCluster

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Adam Boeglin, HPC Solutions Architect Monday, October 31, 2016 Launch a thousand core HPC cluster in minutes with AWS CfnCluster
  2. 2. Webinar Highlights • What is CfnCluster and when to use it • Architecture guidance to fit your security models • How to install and configure of CfnCluster • Demo: Review of CfnCluster and managing compute at scale
  3. 3. Introduction to CfnCluster • AWS CloudFormation + Cluster = CfnCluster • Simple to install, easy to manage • Everything you need to get a cluster up and running in minutes • Head node with scheduler • Shared NFS Storage • /home • /shared • OpenMPI • Compute nodes that grow and shrink on demand
  4. 4. Workloads Well Suited for CfnCluster • Computational Fluid Dynamics • Semiconductor Design • Weather Modeling • Genomics and Molecular Simulation • Seismic and reservoir simulations • 3D rendering and visualizations • … anything that uses a traditional HPC scheduler
  5. 5. Cluster HPC and Grid HPC Cluster HPC Tightly coupled, latency sensitive applications Use larger EC2 compute instances, placement groups, Enhanced Networking Grid HPC Loosely coupled, pleasingly parallel. Requires very little node to node interaction. Grids of Clusters Use a grid strategy on the cloud to run a group of parallel, individually clustered HPC jobs
  6. 6. Computational Fluid Dynamics ANSYS Fluent • AWS c4.8xlarge • 140M cells • F1 car CFD benchmark http://www.ansys-blog.com/simulation-on-the-cloud/
  7. 7. https://aws.amazon.com/hpc/cfncluster/
  8. 8. Configuration Options • Operating System • Amazon Linux • Centos 6 • Centos 7 • Ubuntu 14.04 • Scheduler • Sun Grid Engine (SGE) • OpenLava • Torque • SLURM • Storage Size & IOPS • EBS & Instance Store Encryption • Scaling Speed & Limits • Provisioning Scripts
  9. 9. Many AWS services to tie it all together • CloudFormation manages the state of the cluster • Amazon CloudWatch & Auto Scaling lets compute fleet grow and shrink on demand • Amazon SQS & Amazon SNS allows compute nodes to signal to master when they’re online • AWS Identity and Access Management (IAM) allows for fine grained access control • Amazon S3 for storage of CloudFormation templates
  10. 10. Amazon S3 DynamoDB Amazon SQS CloudWatch Internet Gateway (IGW) region-1a Master Server Auto Scaling Compute Fleet CloudFormation Standalone CfnCluster
  11. 11. Amazon S3 DynamoDB Amazon SQS CloudWatch Internet Gateway (IGW) Private Subnet Master Server Auto Scaling Compute Fleet CloudFormation Public Subnet VPC NAT gateway Private Subnet Route Table VPC Traffic -> Local 0.0.0.0 -> Nat Gateway Public Subnet Route Table VPC Traffic -> Local 0.0.0.0 -> Internet Gateway Isolated CfnCluster Bastian Server
  12. 12. Amazon S3 DynamoDB Amazon SQS CloudWatch Internet Gateway (IGW) Private Subnet Master Server Auto Scaling Compute Fleet CloudFormation Public Subnet VPC NAT gateway Corporate Data Center Engineer VPN Connection Private Subnet Route Table VPC Traffic -> Local Corp IP Range -> VPN 0.0.0.0 -> Nat Gateway Public Subnet Route Table VPC Traffic -> Local Corp IP Range -> VPN 0.0.0.0 -> Internet Gateway Isolated CfnCluster w/ VPN
  13. 13. Private Subnet Master Server Auto Scaling Compute Fleet Amazon S3 DynamoDB Amazon SQS CloudWatch CloudFormation Corporate Data Center Proxy Server VPN Connection Internet Connection Private Subnet Route Table VPC Traffic -> Local Corp IP Range -> VPN 0.0.0.0 -> VPN Private CfnCluster w/ VPN & Proxy
  14. 14. Creating an IAM User • Create an IAM user with Administrative privileges • Fine grain access controls can be done later • Generate an Access & Secret key and keep it safe
  15. 15. Create an SSH Key • Generate or import the key you’ll use for user login
  16. 16. Installing the CfnCluster CLI • On your desktop or a bastion server $ sudo pip install cfncluster
  17. 17. Creating the Base Configuration • First, create the base config required to start a cluster. $ cfncluster configure
  18. 18. Edit the configuration file to meet your needs • Reference the configuration docs • http://cfncluster.readthedocs.io/en/latest/configuration.html $ vim ~/.cfncluster/config
  19. 19. Launch the Cluster $ cfncluster create mycluster • Cluster creation usually takes ~15 minutes • Completely managed by CloudFormation
  20. 20. Submit your first job [ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub #!/bin/bash # #$ -cwd #$ -j y #$ -pe mpi 2 #$ -S /bin/bash # module load openmpi-x86_64 mpirun -np 2 hostname [ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub Your job 1 ("hw.qsub") has been submitted [ec2-user@ip-10-0-0-17 ~]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ------------------------------------------------------------------------------------------------ 1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25 all.q@ip-10-0-0-44.ap-southeas 2 [ec2-user@ip-10-0-0-17 ~]$ ls -l total 8 -rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub -rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1 [ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1 ip-10-0-0-44 ip-10-0-0-45
  21. 21. EBS Snapshots for Software & Storage Management • Install your applications and store any working data to /shared • Create a snapshot of that volume • Re-use that snapshot every time you launch your cluster ebs_snapshot_id = snap-xxxxx Master Server Root & Home Volume (/ & /home) NFS Shared Volume (/shared) Amazon EBS Snapshot (snap-xxxxx)
  22. 22. Upgrading Hardware is Easy! • Simple upgrade from Ivy Bridge to Haswell 1. Let all compute nodes stop 2. Edit ~/.cfncluster/config and change compute_instance_type = c3.8xlarge to compute_instance_type = c4.8xlarge 3. Update the cluster $ cfncluster update mycluster C3 C4
  23. 23. Demo: Launching a Cluster
  24. 24. Thank you!

×