Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ben Butler
Sr. Mgr., Big Data & HPC Marketing
Amazon Web Services
butlerb@amazon.com
@bensbutler
aws.amazon.com/hpc
High P...
Take a typical big computation task…
Big Job
…that an average cluster is too small
(or simply takes too long to complete)…
Big Job
Cluster
…optimization of algorithms can give some leverage…
Big Job
Cluster
…and complete the task in hand…
Big Job
Cluster
Applying a large cluster…
Big Job
Cluster
…can sometimes be overkill and too expensive
Big Job
Cluster
AWS instance clusters can be balanced to
the job in hand…
Big Job
…nor too large…
Small Job
…nor too small…
Bigger Job
…with multiple clusters running at the same time
Why AWS for HPC?
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurr...
Customers running HPC workloads on AWS
Popular HPC workloads on AWS
Transcoding and
Encoding
Monte Carlo
Simulations
Computational
Chemistry
Government and
Educa...
Academic research
BiopharmaUtilities
Manufacturing
Materials Design
Auto & Aerospace
Across several key industry verticals
Jun 2014 Top 500 list
484.2 TFlop/s
26,496 cores in a cluster
of EC2 C3 instances
Intel Xeon E5-2680v2
10C
2.800GHzprocess...
Elastic Cloud-Based Resources
Actual demand
Resources scaled to demand
Benefits of Agility
Waste Customer
Dissatisfaction
...
Unilever: augmenting existing HPC capacity
The key advantage that AWS
has over running this workflow
on Unilever’s existin...
Scalability on AWS
Time:+00h
Scale using Elastic Capacity
<10 cores
Scalability on AWS
Time: +24h
Scale using Elastic Capacity
>1500
cores
Scalability on AWS
Time:+72h
Scale using Elastic Capacity
<10 cores
Time: +120h
Scale using Elastic Capacity
>600 cores
Scalability on AWS
Schrodinger & CycleComputing: computational chemistry
Simulation by Mark Thompson of
the University of Southern
California...
Cost Benefits of HPC in the Cloud
Pay As You Go Model
Use only what you need
Multiple pricing models
On-Premises
Capital E...
Reserved
Make a low, one-time
payment and receive
a significant discount
on the hourly charge
For committed
utilization
Fr...
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Heavy Utilization Reserved Inst...
Harvard Medical School: simulation development
The combination of our approach to
biomedical computing and AWS
allowed us ...
Characterizing HPC
Embarrassingly parallel
Elastic
Batch workloads
Interconnected jobs
Network sensitivity
Job specific al...
Characterizing HPC
Embarrassingly parallel
Elastic
Batch workloads
Interconnected jobs
Network sensitivity
Job specific al...
Characterizing HPC
Embarrassingly parallel
Elastic
Batch workloads
Interconnected jobs
Network sensitivity
Job specific al...
Elastic Compute Cloud (EC2)
c3.8xlarge
g2.medium
m3.large
Basic unit of compute capacity,virtual machines
Range of CPU, me...
CLI, API and Console
Scripted configurations
Automation & Control
Auto Scaling
Automatic re-sizing of compute clusters
based upon demand and policies
Characterizing HPC
Embarrassingly parallel
Elastic
Batch workloads
Interconnected jobs
Network sensitivity
Job specific al...
What if you need to:
Implement MPI?
Code for GPUs?
Cluster compute instances
Implement HVM process execution
Intel® Xeon® processors
10 Gigabit Ethernet –c3 has Enhanced net...
Network placement groups
Cluster instances deployed in a Placement
Group enjoy low latency, full bisection
10 Gbps bandwid...
GPU compute instances
cg1.8xlarge
33.5 EC2 Compute Units
20GB RAM
2x NVIDIA GPU
448 Cores
3GB Mem
g2.2xlarge
26 EC2 Comput...
National Taiwan University: shortest vector problem
Our purpose is to break the record of
solving the shortest vector prob...
CUDA & OpenCL
Massive parallel clusters running in GPUs
NVIDIA Tesla cards in specialized instance types
Tightly coupled
Embarrassingly parallel
Elastic
Batch workloads
Interconnected jobs
Network sensitivity
Job specific algorithms
Data manag...
Data management
Fully-managed SQL, NoSQL, and object storage
Relational Database Service
Fully-managed database
(MySQL, Or...
“Big Data” changes dynamic of computation and data sharing
Collection
Direct Connect
Import/Export
S3
DynamoDB
Computation...
TRADERWORX: Market Information Data
Analytics System
For the growing team of quant types
now employed at the SEC, MIDAS is...
Feeding workloads
Using highly available Simple Queue Service to feed EC2 nodes
Processing task/processing trigger
Amazon ...
Coordinating workloads & task clusters
Handle long running processes across many nodes and task steps with Simple Workflow...
Architecture of large scale computing and huge data sets
NYU School of Medicine: Transferring large data sets
Transferring data is a large
bottleneck; our datasets are
extremely l...
Architecture of a financial services grid computing
Bankinter: credit-risk simulation
With AWS, we now have the power to
decide how fast we want to obtain
simulation results....
Architecture of queue-based batch processing
When to consider running HPC workloads on AWS
New ideas
New HPC project
Proof of concept
New application features
Training...
AWS Grants Program
aws.amazon.com/grants
AWS Public Data Sets
aws.amazon.com/datasets
free for everyone
AWS Marketplace – HPC category
aws.amazon.com/marketplace
Getting Started with HPC on AWS
aws.amazon.com/hpc
contact us, we are here to help
Sales and Solutions Architects
Enterpri...
cfncluster (“CloudFormation cluster”)
Command Line Interface Tool
Deploy and demo an HPC cluster
For more info:
aws.amazon...
HPC Partners and Apps
Oil and Gas
Seismic Data
Processing
Reservoir
Simulations,
Modeling
Geospatial
applications
Predictive
Maintenance
Manufac...
Accelerating materials science
2.3M core-hours for $33K
So What Does Scale Mean on AWS?
Cyclopic energy: computational fluid dynamics
AWS makes it possible for us to
deliver state-of-the-art technologies
to cli...
Mentor Graphics: virtual lab for design and simulation
Thanks to AWS, the Mentor
Graphics customer experience is
now fast,...
AeroDynamic Solutions: turbine engine simulation
We’re delighted to be working closely
with the U.S. Air Force and AWS to
...
HGST: molecular dynamics simulation
HGST is using AWS for a higher
performance, lower cost, faster
deployed solution vs bu...
Pfizer: large-scale data analytics and modeling
AWS enables Pfizer’s Worldwide
Research and Development to
explore specifi...
Thank you!
Please visit aws.amazon.com/hpc for more info
Upcoming SlideShare
Loading in …5
×

Intro to High Performance Computing in the AWS Cloud

12,932 views

Published on

Visit http:aws.amazon.com/hpc for more information about HPC on AWS.

High Performance Computing (HPC) allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. AWS allows you to increase the speed of research by running high performance computing in the cloud and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand without large capital investments. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

Published in: Internet
  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Intro to High Performance Computing in the AWS Cloud

  1. 1. Ben Butler Sr. Mgr., Big Data & HPC Marketing Amazon Web Services butlerb@amazon.com @bensbutler aws.amazon.com/hpc High Performance Computing on AWS
  2. 2. Take a typical big computation task… Big Job
  3. 3. …that an average cluster is too small (or simply takes too long to complete)… Big Job Cluster
  4. 4. …optimization of algorithms can give some leverage… Big Job Cluster
  5. 5. …and complete the task in hand… Big Job Cluster
  6. 6. Applying a large cluster… Big Job Cluster
  7. 7. …can sometimes be overkill and too expensive Big Job Cluster
  8. 8. AWS instance clusters can be balanced to the job in hand… Big Job
  9. 9. …nor too large… Small Job
  10. 10. …nor too small… Bigger Job
  11. 11. …with multiple clusters running at the same time
  12. 12. Why AWS for HPC? Low cost with flexible pricing Efficient clusters Unlimited infrastructure Faster time to results Concurrent Clusters on-demand Increased collaboration
  13. 13. Customers running HPC workloads on AWS
  14. 14. Popular HPC workloads on AWS Transcoding and Encoding Monte Carlo Simulations Computational Chemistry Government and Educational Research Modeling and Simulation Genome processing
  15. 15. Academic research BiopharmaUtilities Manufacturing Materials Design Auto & Aerospace Across several key industry verticals
  16. 16. Jun 2014 Top 500 list 484.2 TFlop/s 26,496 cores in a cluster of EC2 C3 instances Intel Xeon E5-2680v2 10C 2.800GHzprocessors LinPack Benchmark TOP500: 76th fastest supercomputer on-demand
  17. 17. Elastic Cloud-Based Resources Actual demand Resources scaled to demand Benefits of Agility Waste Customer Dissatisfaction Actual Demand Predicted Demand Rigid On-Premises Resources
  18. 18. Unilever: augmenting existing HPC capacity The key advantage that AWS has over running this workflow on Unilever’s existing cluster is the ability to scale up to a much larger number of parallel compute nodes on demand. Pete Keeley Unilever Researchs eScience IT Lead for Cloud Solutions ” “ • Unilever’s digital data program now processes genetic sequences twenty times faster
  19. 19. Scalability on AWS Time:+00h Scale using Elastic Capacity <10 cores
  20. 20. Scalability on AWS Time: +24h Scale using Elastic Capacity >1500 cores
  21. 21. Scalability on AWS Time:+72h Scale using Elastic Capacity <10 cores
  22. 22. Time: +120h Scale using Elastic Capacity >600 cores Scalability on AWS
  23. 23. Schrodinger & CycleComputing: computational chemistry Simulation by Mark Thompson of the University of Southern California to see which of 205,000 organic compounds could be used for photovoltaic cells for solar panel material. Estimated computation time 264 years completed in 18 hours. • 156,314 core cluster across 8 regions • 1.21 petaFLOPS (Rpeak) • $33,000 or 16¢ per molecule
  24. 24. Cost Benefits of HPC in the Cloud Pay As You Go Model Use only what you need Multiple pricing models On-Premises Capital Expense Model High upfront capital cost High cost of ongoing support
  25. 25. Reserved Make a low, one-time payment and receive a significant discount on the hourly charge For committed utilization Free Tier Get Started on AWS with free usage & no commitment For POCs and getting started On-Demand Pay for compute capacity by the hour with no long-term commitments For spiky workloads, or to define needs Spot Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand For time-insensitive or transient workloads Dedicated Launch instances within Amazon VPC that run on hardware dedicated to a single customer For highly sensitive or compliance related workloads Many pricing models to support different workloads
  26. 26. 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Heavy Utilization Reserved Instances Light RI Light RILight RILight RI On-Demand Spot and On- Demand 100% 80% 60% 40% 20% Utilization Over Time Optimize Cost by using various EC2 instance pricing models
  27. 27. Harvard Medical School: simulation development The combination of our approach to biomedical computing and AWS allowed us to focus our time and energy on simulation development, rather than technology, to get results quickly. Without the benefits of AWS, we certainly would not be as far along as we are. Dr. Peter Tonellato, LPM, Center for Biomedical Informatics, Harvard Medical School ” “ • Leveraged EC2 spot instances in workflows • 1 day worth of effort resulted in 50% in cost savings
  28. 28. Characterizing HPC Embarrassingly parallel Elastic Batch workloads Interconnected jobs Network sensitivity Job specific algorithms Loosely Coupled Tightly Coupled
  29. 29. Characterizing HPC Embarrassingly parallel Elastic Batch workloads Interconnected jobs Network sensitivity Job specific algorithms Data management Task distribution Workflow management Loosely Coupled Supporting Services Tightly Coupled
  30. 30. Characterizing HPC Embarrassingly parallel Elastic Batch workloads Interconnected jobs Network sensitivity Job specific algorithms Data management Task distribution Workflow management Loosely Coupled Supporting Services Tightly Coupled
  31. 31. Elastic Compute Cloud (EC2) c3.8xlarge g2.medium m3.large Basic unit of compute capacity,virtual machines Range of CPU, memory & local disk options Choice of instance types, from micro to cluster compute Compute Services
  32. 32. CLI, API and Console Scripted configurations Automation & Control
  33. 33. Auto Scaling Automatic re-sizing of compute clusters based upon demand and policies
  34. 34. Characterizing HPC Embarrassingly parallel Elastic Batch workloads Interconnected jobs Network sensitivity Job specific algorithms Data management Task distribution Workflow management Loosely Coupled Supporting Services Tightly Coupled
  35. 35. What if you need to: Implement MPI? Code for GPUs?
  36. 36. Cluster compute instances Implement HVM process execution Intel® Xeon® processors 10 Gigabit Ethernet –c3 has Enhanced networking, SR-IOV cc2.8xlarge 32 vCPUs 2.6 GHz Intel Xeon E5-2670 Sandy Bridge 60.5 GB RAM 2 x 320 GB Local SSD c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD Tightly coupled
  37. 37. Network placement groups Cluster instances deployed in a Placement Group enjoy low latency, full bisection 10 Gbps bandwidth 10Gbps Tightly coupled
  38. 38. GPU compute instances cg1.8xlarge 33.5 EC2 Compute Units 20GB RAM 2x NVIDIA GPU 448 Cores 3GB Mem g2.2xlarge 26 EC2 Compute Units 16GB RAM 1x NVIDIA GPU 1536 Cores 4GB Mem G2 instances Intel® Intel Xeon E5-2670 1 NVIDIA Kepler GK104 GPU I/O Performance: Very High (10 Gigabit Ethernet) CG1 instances Intel® Xeon® X5570 processors 2 x NVIDIA Tesla “Fermi” M2050 GPUs I/O Performance: Very High (10 Gigabit Ethernet) Tightly coupled
  39. 39. National Taiwan University: shortest vector problem Our purpose is to break the record of solving the shortest vector problem (SVP) in Euclidean lattices…the vectors we found are considered the hardest SVP anyone has solved so far. Prof. Chen-Mou Cheng Principle Investigator of Fast Crypto Lab ” “ • $2,300 for using 100x Tesla M2050 for ten hours
  40. 40. CUDA & OpenCL Massive parallel clusters running in GPUs NVIDIA Tesla cards in specialized instance types Tightly coupled
  41. 41. Embarrassingly parallel Elastic Batch workloads Interconnected jobs Network sensitivity Job specific algorithms Data management Task distribution Workflow management Characterizing HPC Loosely Coupled Supporting Services Tightly Coupled
  42. 42. Data management Fully-managed SQL, NoSQL, and object storage Relational Database Service Fully-managed database (MySQL, Oracle, MSSQL, PostgreSQL) DynamoDB NoSQL, Schemaless, Provisioned throughput database S3 Object datastore up to 5TB per object Internet accessibility Supporting Services
  43. 43. “Big Data” changes dynamic of computation and data sharing Collection Direct Connect Import/Export S3 DynamoDB Computation EC2 GPUs Elastic MapReduce Collaboration CloudFormation Simple Workflow S3 Moving compute closer to the data
  44. 44. TRADERWORX: Market Information Data Analytics System For the growing team of quant types now employed at the SEC, MIDAS is becoming the world’s greatest data sandbox. And the staff is planning to use it to make the SEC a leader in its use of market data Elisse B. Walter, Chairman of the SEC Tradeworx ” “ • Powerful AWS-based system for market analytics • 2M transaction messages/sec; 20B records and 1TB/day
  45. 45. Feeding workloads Using highly available Simple Queue Service to feed EC2 nodes Processing task/processing trigger Amazon SQS Processing results Supporting Services
  46. 46. Coordinating workloads & task clusters Handle long running processes across many nodes and task steps with Simple Workflow Task A Task B (Auto-scaling) Task C Supporting Services 3 1 2
  47. 47. Architecture of large scale computing and huge data sets
  48. 48. NYU School of Medicine: Transferring large data sets Transferring data is a large bottleneck; our datasets are extremely large, and it often takes more time to move the data than to generate it. Since our collaborators are all over the world, if we can’t move it they can’t use it. ” “ • Uses Globus Online • Data transfer speeds of up to 50MB/s Dr. Stratos Efstathiadis Technical Director of the HPC facility, NYU
  49. 49. Architecture of a financial services grid computing
  50. 50. Bankinter: credit-risk simulation With AWS, we now have the power to decide how fast we want to obtain simulation results. More important, we have the ability to run simulations that were not possible before due to the large amount of infrastructure required. Javier Roldán Director of Technological Innovation, Bankinter ” “ • Reduced processing time of 5,000,000 simulations from 23 hours to 20 minutes
  51. 51. Architecture of queue-based batch processing
  52. 52. When to consider running HPC workloads on AWS New ideas New HPC project Proof of concept New application features Training Benchmarking algorithms Remove the queue Hardware refresh cycle Reduce costs Collaboration of results Increase innovation speed Reduce time to results Improvement
  53. 53. AWS Grants Program aws.amazon.com/grants
  54. 54. AWS Public Data Sets aws.amazon.com/datasets free for everyone
  55. 55. AWS Marketplace – HPC category aws.amazon.com/marketplace
  56. 56. Getting Started with HPC on AWS aws.amazon.com/hpc contact us, we are here to help Sales and Solutions Architects Enterprise Support Trusted Advisor Professional Services
  57. 57. cfncluster (“CloudFormation cluster”) Command Line Interface Tool Deploy and demo an HPC cluster For more info: aws.amazon.com/hpc/resources Try out our HPC CloudFormation-based demo
  58. 58. HPC Partners and Apps
  59. 59. Oil and Gas Seismic Data Processing Reservoir Simulations, Modeling Geospatial applications Predictive Maintenance Manufacturing & Engineering Computational Fluid Dynamics (CFD) Finite Element Analysis (FEA) Wind Simulation Life Sciences Genome Analysis Molecular Modeling Protein Docking Media & Entertainment Transcoding and Encoding DRM, Encryption Rendering Scientific Computing Computational Chemistry High Energy Physics Stochastic Modeling Quantum Analysis Climate Models Finaancial Monte Carlo Simulations Wealth Management Simulations Portfolio, Credit Risk Analytics High Frequency Trading Analytics Costomers are using AWS for more and more HPC workloads
  60. 60. Accelerating materials science 2.3M core-hours for $33K So What Does Scale Mean on AWS?
  61. 61. Cyclopic energy: computational fluid dynamics AWS makes it possible for us to deliver state-of-the-art technologies to clients within timeframes that allow us to be dynamic, without having to make large investments in physical hardware. Rick Morgans Technical Director (CTO), cyclopic energy ” “ • Two months worth of simulations finished in two days
  62. 62. Mentor Graphics: virtual lab for design and simulation Thanks to AWS, the Mentor Graphics customer experience is now fast, fluid, and simple. Ron Fuller Senior Director of Engineering, Mentor ” “ • Developed a virtual lab for ASIC design and simulation for product evaluation and training
  63. 63. AeroDynamic Solutions: turbine engine simulation We’re delighted to be working closely with the U.S. Air Force and AWS to make time accurate simulation a reality for designers large and small. George Fan CEO, AeroDynamic Solutions ” “ • Time accurate simulation was turned around in 72 hours with infrastructure costs well below $1,000
  64. 64. HGST: molecular dynamics simulation HGST is using AWS for a higher performance, lower cost, faster deployed solution vs buying a huge on-site cluster. Steve Philpott CIO, HGST ” “ • Uses HPC on AWS for CAD, CFD, and CDA
  65. 65. Pfizer: large-scale data analytics and modeling AWS enables Pfizer’s Worldwide Research and Development to explore specific difficult or deep scientific questions in a timely, scalable manner and helps Pfizer make better decisions more quickly. Dr. Michael Miller Head of HPC for R&D, Pfizer ” “ • Pfizer avoiding having to procure new HPC hardware by being able to use AWS for peak work loads.
  66. 66. Thank you! Please visit aws.amazon.com/hpc for more info

×