Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

2,371 views

Published on

In today’s world, grid computing needs are dynamic due to business, market, and technology changes. With AWS, you can easily create grid computing clusters running Microsoft HPC Pack 2012 R2 to meet these dynamic computing needs. This session covers architectural patterns and best practices using Amazon EC2, Amazon S3, AWS Directory Service, and AWS CloudFormation to create on-demand Windows HPC clusters. We also review automation frameworks to more easily and dynamically provision Windows HPC clusters in an on-demand fashion.

Published in: Technology

(CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Timothy DiLauro, AWS Solutions Architect Julien Lépine, AWS Solutions Architect October 2015 CMP306 On-Demand Windows HPC on AWS Windows Clusters for Dynamic Needs
  2. 2. What to Expect from the Session HPC on AWS AWS Architecture for Windows HPC AWS Architecture for HPC Best Practices for Windows HPC Demonstration
  3. 3. HPC on AWS
  4. 4. Low cost with flexible pricing Efficient clusters Unlimited infrastructure Faster time to results Concurrent Clusters on-demand Increased collaboration Why AWS for HPC?
  5. 5. Popular HPC workloads on AWS Genome processing Modeling and Simulation Government and Educational Research Monte Carlo Simulations Transcoding and Encoding Computational Chemistry
  6. 6. Benefits of Agility Elastic Cloud-Based Resources Actual demand Resources scaled to demand Waste Customer Dissatisfaction Actual Demand Predicted Demand Rigid On-Premises Resources
  7. 7. Cost Benefits of HPC in the Cloud Pay As You Go Model Use only what you need Multiple pricing models On-Premises Capital Expense Model High upfront capital cost High cost of ongoing support
  8. 8. AWS Journey for HPC Customer Dev, Test, Eval True Production Mission Critical All-in Build new production apps Migrate production apps Build mission-critical apps Migrate mission-critical apps Development and test Eval and training Corporate standard “Cloud First”
  9. 9. AWS Architecture for HPC
  10. 10. On-Demand HPC on AWS With AWS, deploy multiple clusters running at the same time and match the architectures to the jobs
  11. 11. AWS Architecture for HPC Amazon Virtual Private Cloud Amazon Simple Storage Service Amazon Elastic Block Store Amazon Elastic Compute Cloud Amazon CloudWatch AWS CloudFormation Auto Scaling
  12. 12. 2006 2007 2008 2009 2010 2011 2012-2013 2014 m1.small m1.xlarge m1.large m1.small m2.2xlarge m2.4xlarge c1.medium c1.xlarge m1.xlarge m1.large m1.small cc2.8xlarge cc1.4xlarge cg1.4xlarge t1.micro m2.xlarge m2.2xlarge m2.4xlarge c1.medium c1.xlarge m1.xlarge m1.large m1.small cr1.8xlarge hs1.8xlarge m3.xlarge m3.2xlarge hi1.4xlarge m1.medium cc2.8xlarge cg1.4xlarge t1.micro m2.xlarge m2.2xlarge m2.4xlarge c1.medium c1.xlarge m1.xlarge m1.large m1.small cc1.4xlarge cg1.4xlarge t1.micro m2.xlarge m2.2xlarge m2.4xlarge c1.medium c1.xlarge m1.xlarge m1.large m1.small t2.micro t2.small t2.medium t1.micro hs1.8xlarge m3.xlarge m3.2xlarge hi1.4xlarge m1.medium cc2.8xlarge cr1.8xlarge cg1.4xlarge m2.xlarge m2.2xlarge m2.4xlarge c1.medium c1.xlarge m1.xlarge m1.large m1.small c1.medium c1.xlarge m1.xlarge m1.large m1.small new existing Amazon Elastic Compute Cloud g2.2xlarge hs1.xlarge hs1.2xlarge hs1.4xlarge c3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarge m3.medium m3.large i2.large i2.xlarge i2.4xlarge i2.8xlarge r3.large r3.xlarge r3.2xlarge r3.4xlarge r3.8xlarge Continuing to enable customer choice and right sizing of clusters m4.large m4.xlarge m4.2xlarge d2.xlarge d2.2xlarge d2.4xlarge d2.8xlarge t2.micro t2.small t2.medium t2.large t1.micro hs1.8xlarge m3.xlarge m3.2xlarge hi1.4xlarge m1.medium cc2.8xlarge cr1.8xlarge cg1.4xlarge m2.xlarge m2.2xlarge m2.4xlarge c1.medium c1.xlarge m1.xlarge m1.large m1.small m4.4xlarge m4.10xlarge c4.xlarge c4.2xlarge c4.4xlarge c4.8xlarge g2.8xlarge g2.2xlarge hs1.xlarge hs1.2xlarge hs1.4xlarge c3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarge m3.medium m3.large i2.large i2.xlarge i2.4xlarge i2.8xlarge r3.large r3.xlarge r3.2xlarge r3.4xlarge r3.8xlarge 2015
  13. 13. Auto Scaling and Amazon CloudWatch Match demands of cluster queue with appropriate compute needs CloudWatch Auto Scaling group Windows HPC Job Manager
  14. 14. Amazon Elastic Block Store • Designed for five nines of availability • Attaches to Amazon EC2 within the same Availability Zone • Point-in-time snapshots to Amazon S3 • Checkbox enabled encryption Magnetic General Purpose (SSD) Provisioned IOPS (SSD) Volume types When performance matters, use SSD- backed volumes! Network attached persistent block storage volumes for Amazon EC2
  15. 15. Amazon EBS • Default 30 GB volume • Gets initial I/O credit of 5.4M • Burst for up to 30 mins @ 3000 IOPS • Accumulate 90 I/O credits/second Windows Boot Volume Decrease launch time of instances by leveraging General Purpose SSD
  16. 16. Amazon Simple Storage Service Store input and result datasets for dynamic and transitive Windows HPC clusters Redundancy Durability: designed for 99.999999999% Availability: designed for 99.9% Capacity Consumption-based storage model Virtually unlimited capacity Security Encryption in Transit: HTTPS/TLS Encryption at Rest: SSE, SSE-C, SSE-KMS Ease of use Storage Classes: Standard, RRS, Glacier Lifecycle Policies: archive, expiration
  17. 17. Amazon S3 Copy data to Amazon S3 and enable SSE Write-S3Object –BucketName mybucket -Folder .Scripts -KeyPrefix SampleScripts -ServerSideEncryption Copy data from Amazon S3 to a local folder Read-S3Object –BucketName mybucket -KeyPrefix SampleScripts –Folder . • Bucket: mybucket • Keyname Space: SampleScripts • Local Folder: .Scripts Migrate data to AWS and Windows HPC clusters with AWS Tools for PowerShell
  18. 18. AWS CloudFormation • Create templates to describe the AWS resources used to run your application • Provision identical copies of a stack • Templates can be stored in a source control system • Track all changes made to your infrastructure stack • Modify and update resources in a controlled and predictable way • Just choose what resources and configurations you need • Customize your template via parameters Consistently and easily deploy Windows HPC clusters based on workflow needs Templated resource provisioning Infrastructure as code Declarative and flexible
  19. 19. AWS Architecture for HPC • Users directory • Bastion host • Head node • Compute nodes Core Infrastructure Cluster Infrastructure Amazon VPC Users Bastion Core Head Compute Compute Compute Compute Compute Compute Compute Compute Cluster
  20. 20. AWS Architecture for HPC Hybrid or “burst” All-in AWS Choose the right deployment architecture for the use case Core infrastructure: Users directory Bastion host On-premises AWS AWS Directory Service Amazon EC2 Cluster infrastructure: Head node Compute node Storage AWS AWS On-premises/AWS Amazon EC2 Amazon EC2 Amazon S3 User workstations On-premises Amazon WorkSpaces
  21. 21. AWS Architecture for HPC “Burst” to virtually unlimited compute capacity in AWS Amazon VPC Users Bastion Core Head Compute Compute Compute Compute Compute Compute Compute Compute ClusterWorkstations Head HPC Users CoreCluster On-Premise HPC HPC HPC
  22. 22. AWS Architecture for HPC Deploy users, infrastructure, and cluster all in AWS Amazon VPC Core Head Compute Compute Compute Compute Compute Compute Compute Compute ClusterWorkstations Users Bastion
  23. 23. AWS Architecture for Windows HPC
  24. 24. Windows Server on AWS Easy Licensing OS $/Hr BYOL Optimized AWS Software for Windows EC2Config, drivers Experience October 2008 Every use case Every industry OS Choice 2003R2 2008, 2008R2 2012, 2012R2 Microsoft Portfolio SQL Server SharePoint Exchange, Lync Customize Systems 50+ EC2 instances 32, 64 bits CPU, GPU
  25. 25. AWS Architecture for Windows HPC Networking best practices for Windows HPC clusters • Network Design- Leverage both public and private subnets, manage sizing • Availability – Use multi-AZ design • Access Control – use VPC endpoint and NAT for external accesses Availability Zone A Availability Zone B Private Subnet 10.0.10.0/24 Public Subnet 10.0.0.0/24 Core Private Subnet 2 10.0.11.0/24 VPC Endpoint NAT Public Subnet 10.0.1.0/24 NAT
  26. 26. AWS Architecture for Windows HPC • Domain Controller – Highly available extension of your existing environment • Remote Desktop Gateway - Increase security posture Core infrastructure best practices for Windows HPC clusters Availability Zone A Availability Zone B Private Subnet 10.0.10.0/24 Public Subnet 10.0.0.0/24 DC Core Private Subnet 2 10.0.11.0/24 DC RDGW Public Subnet 10.0.1.0/24
  27. 27. AWS Architecture for Windows HPC • Head Node – Size independent of Compute Node, General Purpose family • Compute Nodes – use Auto Scaling groups and cluster instances • S3 Bucket – Persistent, secure, available storage of cluster input and results Cluster infrastructure best practices for Windows HPC clusters Availability Zone B Availability Zone A Private Subnet 10.0.10.0/24 Public Subnet 10.0.0.0/24 Core Private Subnet 2 10.0.11.0/24 Head Compute Compute Compute Compute Compute Compute Compute Compute Cluster Public Subnet 10.0.1.0/24 S3 Bucket VPC Endpoint
  28. 28. AWS Architecture for Windows HPC All at once, complete Windows HPC infrastructure on AWS Availability Zone B Availability Zone A Private Subnet 10.0.10.0/24 Public Subnet 10.0.0.0/24 DC S3 Bucket Core Private Subnet 2 10.0.11.0/24 DC Head Compute Compute Compute Compute Compute Compute Compute Compute Cluster VPC Endpoint RDGW NAT Public Subnet 10.0.1.0/24 NAT
  29. 29. AWS Architecture for Windows HPC Launch multiple clusters right-sized to complete work in amount of time specified Private Subnet 10.0.10.0/24 Public Subnet 10.0.0.0/24 DC Core Private Subnet 2 10.0.11.0/24 DC Head Compute Compute Compute Compute Compute Compute Compute Compute Cluster Head Compute Compute Compute Compute Compute Compute Compute Compute Head Compute Compute Compute Compute Compute Compute Compute Compute Compute Compute RDGW NAT Public Subnet 10.0.1.0/24 NAT Availability Zone A Availability Zone B S3 Bucket VPC Endpoint
  30. 30. Best Practices for Windows HPC
  31. 31. Secure Windows HPC Workloads on AWS AWS Resource Access: Enable access to AWS resource through policies in IAM roles Encryption at Rest: Enable encryption on EBS volumes and specify server side encryption for objects in Amazon S3 Create private access to input and output results stored in Amazon S3 via VPC endpoints Ensure auditability of AWS account by enabling AWS CloudTrail Leverage native AWS security features to enhance the security posture of Windows HPC
  32. 32. Optimized network for Windows HPC Enhanced Networking: SR-IOV feature provides higher PPS performance, lower latencies, and very low network jitter Placement Groups: All instances get low latency, full bisection, 10Gbps bandwidth between instances EBS Optimization: Get up to 4000Mbps additional dedicated throughput dedicated to your storage needs AWS PV Drivers / Intel Drivers: Make sure you stay current with the latest Get the most of AWS networking for your HPC workloads
  33. 33. Optimized processing with Windows HPC Hyper-threading: Most current generation AWS instances provide hyper-threading, keep it or deactivate it based on your needs Turbo Boost: Latest generation of instances leave you control C- state and P-state registers for your processors The right instance: Choose your constraints (price, CPU, GPU, RAM, network) and get the instance type that fits your use case The right storage: Choose the amount and support of instance storage or Amazon EBS storage required, and leverage storage services such as Amazon S3 Get the most of your instances for your HPC workloads
  34. 34. Automated Windows HPC computing Windows PowerShell®: You can get all the installation and configuration of the instances done automatically AWS Tools for Windows PowerShell: Your cluster can become aware of the infrastructure it is running on Auto Scaling: Automate provisioning and scaling of your cluster to have your workloads finished when you need them AWS CloudFormation: Deploy your clusters in a few clicks, create test clusters in minutes Get your cluster as code, running in minutes from scratch
  35. 35. Demonstration
  36. 36. Windows HPC AWS CloudFormation Template Enable automated deployments of clusters with pre-built template Amazon VPC DC RDGW Core Head Compute Compute Compute Compute Compute Compute Compute Compute Cluster
  37. 37. AWS CloudFormation Templates: Prerequisites Things to do before starting the template Select your region and base image • VPC + Subnet: Just input selected CIDR • Instance Types: for all instances • (Optional) Placement Group: Create a VPC placement group Prepare installation media then snapshot • Download Microsoft HPC Pack and unzip to HPCPack2012R2-Full • Extract SQL Server installation to SQLInstall • Download Intel SR-IOV drivers and extract to PROWinx64 • Download latest AWS PV drivers and extract to AWSPVDriverSetup Select installation configuration: • Define domain configuration and credentials
  38. 38. AWS CloudFormation Template: Core Building the core Windows infrastructure Base Network • VPC + Public Subnet: Select your CIDR • DHCP Option Set: Configured to use DC • Security Groups: For bastion and cluster Core Infrastructure: • Domain Controller in new forest • Remote Desktop Bastion Host (outside of domain) • Domain User “Join Computer to Domain” privileges
  39. 39. AWS CloudFormation Template: Cluster Building the Microsoft HPC cluster on AWS Head-Node • Multi-role: database, HPC Head node, Share • Monitored: Amazon CloudWatch Custom metrics Compute Nodes: • Automated: Automatic configuration to join the cluster • Scalable: Auto Scaling group resizing the cluster based on load • Up-to-date: auto upgrade of AWS and Intel Drivers
  40. 40. Windows HPC AWS CloudFormation Template In < 30 minutes, your cluster will be ready to accept jobs.
  41. 41. Getting Started Collateral QwikLAB: Launching Microsoft HPC Pack on AWS: https://www.qwiklab.com/focuses/preview/1604?search=19103 Reference CloudFormation Template: https://github.com/awslabs/aws-cfn-windows-hpc--template
  42. 42. Remember to complete your evaluations!
  43. 43. Thank you!

×