SlideShare a Scribd company logo
1 of 49
Cloud HPC at AWS
Dr. Jeffrey Layton, Principal Architect - HPC
November, 2015
Research Computing
How is AWS used for Scientific Computing?
• HPC for Engineering and Simulation
• High Throughput Computing (HTC) for Data Intensive
Analytics
• Hybrid Supercomputing Centers
• Collaborative Research Environments
• Citizen Science
• Science-as-a-Service
• Machine Learning
Why do researchers love using AWS?
• Time to Science
– Access to research infrastructure in minutes
• Low Cost
– Pay as you go computing
• Elastic
– Easily add or remove capacity
• Globally Accessible
– Easily Collaborate with researchers around the world
• Secure
– A collection of tools to protect data and privacy
• Scalable
– Access to effectively limitless capacity
Why does AWS care about Scientific Computing?
• We want to improve our world by accelerating the pace of scientific
discovery
• It is a great application of AWS with a broad customer base
• The scientific community helps us innovate on behalf of all
customers
– Streaming data processing and analytics
– Exabyte scale data management solutions and exaflop scale computing
– Collaborative research tools and techniques
– New AWS regions
– Significant advances in low-power compute, storage, and data centers
– Efficiencies which lower our costs and therefore pricing for all customers
Peering with all global research networks
• Internet2
• aarnet
• GEANT
• Sinet
• ESnet
• Pacific Gigapop
Public Datasets
• Landsat
• NEX
• 1000 Genomes Project
• Human Microbiome Project
HPC
Why Cloud for HPC?
• Scalability
– If you need to run on lots of nodes, just spin them up
– If you don’t need nodes, turn them off (and don’t pay for them)
• Time to Research
– Usually on-prem HPC resources are centralized (shared)
– Researchers like to have their own nodes when they need them
• World-Wide Collaboration
– Share data via the cloud
• Latest Technology
• Can save $$$
• Different kinds of nodes (instances)
HPC Architectures
AWS HPC Architecture – Phases of Deployment
• Fork lift
– Make it look like on-premise
• Cloud “port”
– Adapt to cloud features
• Autoscaling
• Spot
• Born in the Cloud
• Rethink application
You must think in “cloud”
You cannot think in “on-prem” and transpose
You must think in “cloud”
Do you think you can do that Mr. Gant?
AWS HPC Architecture
Master Node
Compute Node Compute Node
Compute Node Compute Node
Storage
(NFS, Parallel)
Master Instance
Compute Instance Compute Instance
Compute Instance Compute Instance
Storage
(NFS, Parallel)
On-Premise AWS Cloud
Compute Instance
Compute Instance
Architecture points
• Why be limited to the number of nodes in your on-prem
cluster?
– Cloud allows to scale up and down as needed
• Why limit yourself to a “single” cluster for all users?
– Why not give each user their own cluster?
– They can scale up and down as needed
• Leave your data in the cloud
– Compute on it as needed
– Share it as needed
– Life cycle control
– Visualization
Queues
The Hidden Cost of Queues
Conflicting goals
• HPC users seek fastest possible time-to-results
• Simulations are not steady-state workloads
• IT support team seeks highest possible utilization
Result:
• The job queue becomes the capacity buffer
• Job completion times are hard to predict
• Users are frustrated and run fewer jobs
?
On AWS, deploy multiple clusters
running at the same time and match
the architectures to the jobs
Example: TACC Portal – 4/6/2015
Need capacity
Too much capacity
Too much capacity
Example: NERSC Portal – 4/30/2015
Need more capacity!!
Edison – 4:1 backlog
Hopper – 2:1 backlog
Carver – 1.5:1 backlog
Other Stats:
• XSEDE:
– In 2012, ~32% of jobs only used 1 core
• 72% of all jobs used 16 cores or less (single c4 instance)
• ECMWF (European weather forecasting)
– 82% of all jobs are either single node or single-core
Spot is the Bomb!
Multiple Pricing Models
Reserved
Make a low, one-time
payment and receive
a significant discount
on the hourly charge
For committed
utilization
Free Tier
Get Started on AWS
with free usage &
no commitment
For POCs and
getting started
On-Demand
Pay for compute
capacity by the hour
with no long-term
commitments
For spiky workloads,
or to define needs
Spot
Bid for unused
capacity, charged at
a Spot Price which
fluctuates based on
supply and demand
For time-insensitive
or transient
workloads
AWS Spot is a
game-changer
for HPC
Quick Spot Comparison:
• Compare Most Expensive “Spot” to least expensive “On-
demand”
• Master Node:
– c4.8xlarge
– 2x gp2 1TB EBS volumes
– On-Demand
• us-east - $1.856/hour
• us-west-1 - $2.208/hour
• Compute Nodes
– c4.8xlarge
• On-demand us-east - $1.856/hour
• Spot (us-west-1) - $0.28/hour
Spot vs. On-Demand
4 compute nodes, 2 hours
• On-Demand, us-east
– $19.13
• Spot (us-west-1)
– $7.22
• Ratio: 2.64
16 compute nodes, 32 hours
• On-Demand, us-east
– $1,018.77
• Spot (us-west-1)
– $223.11
• Ratio: 4.57
Cluster Tools
MIT STARcluster - an HPC cluster in minutes
http://star.mit.edu/cluster/
StarCluster is a utility for creating and managing distributed computing
clusters hosted on Amazon's Elastic Compute Cloud.
It uses Amazon's EC2 API to create and destroy clusters of Linux virtual
machines on demand. It’s an easy-to-use and extensible cluster
computing toolkit for the cloud.
15 minutes
http://bit.ly/starclusterArticle
Bright Cluster Manager
http://www.brightcomputing.com
Bright cluster manager is an established very popular HPC
cluster management platform that can simultaneously manage
both on-premises clusters as well as infrastructure in the cloud
- all using the same system images.
Bright has offices in the UK, Netherlands (HQ) and US.
cfnCluster - provision an HPC cluster in minutes
#cfncluster
https://github.com/awslabs/cfncluster
cfncluster is a sample code framework that deploys and maintains
clusters on AWS. It is reasonably agnostic to what the cluster is for and
can easily be extended to support different frameworks. The CLI is
stateless, everything is done using CloudFormation or resources within
AWS.
10 minutes
Infrastructure as code
#cfncluster
The creation process might take a few minutes
(maybe up to 5 mins or so, depending on how you
configured it.
Because the API to Cloud Formation (the service
that does all the orchestration) is asynchronous, we
can kill the terminal session if we wanted to and
watch the whole show from the AWS console
(where you’ll find it all under the “Cloud
Formation”dashboard in the events tab for this
stack.
$ cfnCluster create boof-cluster
Starting: boof-cluster
Status: cfncluster-boof-cluster - CREATE_COMPLETE Output:"MasterPrivateIP"="10.0.0.17"
Output:"MasterPublicIP"="54.66.174.113"
Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/"
Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/"
Yes, it’s a real HPC cluster
#cfncluster
Now you have a cluster, probably running CentOS 6.x, with Sun Grid Engine as a default scheduler, and openMPI and a bunch of
other stuff installed. You also have a shared filesystem in /shared and an autoscaling group ready to expand the number of
compute nodes in the cluster when the existing ones get busy.
You can customize quite a lot via the .cfncluster/config file - check out the comments.
arthur ~ [26] $ cfnCluster create boof-cluster
Starting: boof-cluster
Status: cfncluster-boof-cluster - CREATE_COMPLETE
Output:"MasterPrivateIP"="10.0.0.17"
Output:"MasterPublicIP"="54.66.174.113"
Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/"
Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/"
arthur ~ [27] $ ssh ec2-user@54.66.174.113
The authenticity of host '54.66.174.113 (54.66.174.113)' can't be established.
RSA key fingerprint is 45:3e:17:76:1d:01:13:d8:d4:40:1a:74:91:77:73:31.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.66.174.113' (RSA) to the list of known hosts.
[ec2-user@ip-10-0-0-17 ~]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 10185764 7022736 2639040 73% /
tmpfs 509312 0 509312 0% /dev/shm
/dev/xvdf 20961280 32928 20928352 1% /shared
[ec2-user@ip-10-0-0-17 ~]$ qhost
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS
----------------------------------------------------------------------------------------------
global - - - - - - - - - -
ip-10-0-0-136 lx-amd64 8 1 4 8 - 14.6G - 1024.0M -
ip-10-0-0-154 lx-amd64 8 1 4 8 - 14.6G - 1024.0M -
[ec2-user@ip-10-0-0-17 ~]$ qstat
[ec2-user@ip-10-0-0-17 ~]$
[ec2-user@ip-10-0-0-17 ~]$ ed hw.qsub
hw.qsub: No such file or directory
a
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -pe mpi 2
#$ -S /bin/bash
#
module load openmpi-x86_64
mpirun -np 2 hostname
.
w
110
q
[ec2-user@ip-10-0-0-17 ~]$ ll
total 4
-rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub
[ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub
Your job 1 ("hw.qsub") has been submitted
[ec2-user@ip-10-0-0-17 ~]$
[ec2-user@ip-10-0-0-17 ~]$ qstat
job-ID prior name user state submit/start at queue
slots ja-task-ID
---------------------------------------------------------------------------
---------------------
1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25 all.q@ip-
10-0-0-44.ap-southeas 2
[ec2-user@ip-10-0-0-17 ~]$ qstat
[ec2-user@ip-10-0-0-17 ~]$ ls -l
total 8
-rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub
-rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1
[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1
ip-10-0-0-136
ip-10-0-0-154
[ec2-user@ip-10-0-0-17 ~]$
System-wide Upgrade from Ivy Bridge to Haswell
#cfncluster
Yes, really :-)
$ ed ~/.cfncluster/config
/compute_instance_type/
compute_instance_type = c3.8xLarge
s/c3/c4/p
compute_instance_type = c4.8xLarge
w
949
$ cfncluster update boof-cluster
Downgrading is just as easy. Honest.
Config options to explore …
#cfncluster
Many options, but the most interesting ones immediately are:
# (defaults to t2.micro for default template)
compute_instance_type = t2.micro
# Master Server EC2 instance type
# (defaults to t2.micro for default template
#master_instance_type = t2.micro
# Inital number of EC2 instances to launch as compute nodes in the cluster.
# (defaults to 2 for default template)
#initial_queue_size = 1
# Maximum number of EC2 instances that can be launched in the cluster.
# (defaults to 10 for the default template)
#max_queue_size = 10
# Boolean flag to set autoscaling group to maintain initial size and scale back
# (defaults to false for the default template)
#maintain_initial_size = true
# Cluster scheduler
# (defaults to sge for the default template)
scheduler = sge
# Type of cluster to launch i.e. ondemand or spot
# (defaults to ondemand for the default template)
#cluster_type = ondemand
# Spot price for the ComputeFleet
#spot_price = 0.00
# Cluster placement group. This placement group must already exist.
# (defaults to NONE for the default template)
#placement_group = NONE
t2.micro is tiny
c3.4xlarge might be more interesting …
Min & Max size
of your cluster.
Whether to fall
back when things
get quiet
Also can use
‘openlava’ or
‘torque’
Explore the SPOT
market if you want to
save money :-)
A placement group will
provision your instances
very close to each other on
the network.
Notable HPC Examples
C3 Instance Cluster*
484 TFLOPS
Making it the 64th fastest
supercomputer in the world
*Representing a tiny fraction of total AWS compute capacity
The Problem for Cancer Drug Design:
• Cancer researcher needed 50,000 cores,
(not available in-house)
The options they didn’t choose:
• Buy infrastructure: Spend Millions, wait 6 months
• Spend months writing software
“We contacted our friends at Cycle Computing,
AWS, … to create a system that was fast,
extremely secure, … inexpensive, and easy to
use.”
Accelerating Science
Final Solution:
• 3 new compounds
– 40 years of computing
– 10,466 servers World-Wide
• $44M cluster for 8 hours for $4,362
– Multiple AZ’s
– Multiple Regions
– Automated bidding
– Optimized orchestration
University of Southern California
• USC Chemistry Professor
Dr. Mark Thompson
• “Solar energy has the potential
to replace some of our
dependence on fossil fuels, but
only if the solar panels can be
made very inexpensively and
have reasonable to high
efficiencies. Organic solar cells
have this potential.”
Challenge:
• Examine possible organic compounds for producing
solar energy
– Computational testing of 205,000 compounds
• Requires 2,312,959 core-hours
– (264 compute years)
• $68M on-premise system
Solution:
• CycleServer from Cycle Computing
• 16,788 Spot Instances
• 156,314 cores
– Average of 9.3 cores per instance
– 1.21 PFLOPS (Rpeak)
Region Deployment
US-West-1
US-East
EU
US-West-2
Brazil
Singapore
Tokyo
Australia
Resilient Workload Scheduling
What Does Scale Mean in the Cloud?
18 hours
205,000 materials analyzed
156,314 AWS Spot cores at peak
2.3M core-hours
Total spending: $33K
(Under 1.5 cents per core-hour)
Summary:
• 205,000 molecules
• 264 years of computing
• Done in 18 hours on $68M system
• Cost only $33,000
NASA Head in the Clouds Project
• Project Goal
– Using NGA data to estimate tree and bush biomass over the entire
arid and semi-arid zone on the south side of the Sahara
• Project Summary
– Estimate carbon stored in trees and bushes in arid and semi-arid
south Sahara
– Establish carbon baseline for later research on expected CO2
uptake on the south side of the Sahar
• Principal Investigators
– Dr. Compton J. Tucker, NASA Goddard Space Flight Center
– Dr. Paul Morin, University of Minnesota
• Participants:
– NASA GSFC, AWS, Intel
Existing Sub-Saharan Arid and Semi-arid Sub-meter Commercial Imagery
9600 Strips (~80TB) to be delivered to GSFC
~1600 strips (~20TB) at GSFC
Area Of Interest (AOI) for Sub-Saharan Arid and Semi-arid Africa
The DigtalGlobe Constellation
The Entire Archive is Licensed to the USG
Geoeye
Quickbird
Ikonos
Worldview 1
Worldview 2
Worldview 3 (Available Q1 2015)
Panchromatic & Multi-spectral Mapping
at the 40 & 50 cm scale
First Phase Results
• Approximately 1/3 of data processed
• 200 Spot instances
• 6 hours of processing
• Run in us-west-2 region
– Carbon Neutral
– “Helping the planet not harming it”
• $80
Thank You
AWS pricing
• Three ways to pay:
– On-Demand
• You can start an instance anytime you want
• Most expensive
– Reserved Instances
• Can have a significant discount (up to 75%) compared to On-Demand
• Reserved Instances provide you with a capacity reservation, so you can
have confidence that you will be able to launch the instances you have
reserved when you need them
– Spot
• Spot Instances enable you to bid for unused Amazon EC2 capacity
• Instances are charged the Spot Price, which is set by Amazon EC2 and
fluctuates periodically depending on the supply of and demand for Spot
Instance capacity

More Related Content

What's hot

AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWSIan Massingham
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud ServicesDavid J Rosenthal
 
Introduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explainedIntroduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explainedDr Neelesh Jain
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep diveWinton Winton
 
Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)Pulkit Gupta
 
Let's Talk About: Azure Networking
Let's Talk About: Azure NetworkingLet's Talk About: Azure Networking
Let's Talk About: Azure NetworkingPedro Sousa
 
Introduction to Virtualization
Introduction to VirtualizationIntroduction to Virtualization
Introduction to VirtualizationRahul Hada
 
Azure Monitoring Overview
Azure Monitoring OverviewAzure Monitoring Overview
Azure Monitoring Overviewgjuljo
 
Disaster Recovery Options with AWS
Disaster Recovery Options with AWSDisaster Recovery Options with AWS
Disaster Recovery Options with AWSAmazon Web Services
 
 Introduction google cloud platform
 Introduction google cloud platform Introduction google cloud platform
 Introduction google cloud platformmarwa Ayad Mohamed
 

What's hot (20)

Azure 101
Azure 101Azure 101
Azure 101
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud Services
 
Introduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explainedIntroduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explained
 
Aws
AwsAws
Aws
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
 
Virtualization Basics
Virtualization BasicsVirtualization Basics
Virtualization Basics
 
Introduction to Microsoft Azure Cloud
Introduction to Microsoft Azure CloudIntroduction to Microsoft Azure Cloud
Introduction to Microsoft Azure Cloud
 
Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)
 
cloud computing
cloud computingcloud computing
cloud computing
 
AWS Webcast - Disaster Recovery
AWS Webcast - Disaster RecoveryAWS Webcast - Disaster Recovery
AWS Webcast - Disaster Recovery
 
Let's Talk About: Azure Networking
Let's Talk About: Azure NetworkingLet's Talk About: Azure Networking
Let's Talk About: Azure Networking
 
Introduction to Virtualization
Introduction to VirtualizationIntroduction to Virtualization
Introduction to Virtualization
 
Azure Monitoring Overview
Azure Monitoring OverviewAzure Monitoring Overview
Azure Monitoring Overview
 
Clustering and High Availability
Clustering and High Availability Clustering and High Availability
Clustering and High Availability
 
Disaster Recovery Options with AWS
Disaster Recovery Options with AWSDisaster Recovery Options with AWS
Disaster Recovery Options with AWS
 
 Introduction google cloud platform
 Introduction google cloud platform Introduction google cloud platform
 Introduction google cloud platform
 
Aws ppt
Aws pptAws ppt
Aws ppt
 
Azure: PaaS or IaaS
Azure: PaaS or IaaSAzure: PaaS or IaaS
Azure: PaaS or IaaS
 

Viewers also liked

Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
 
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...Amazon Web Services
 
BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012
BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012
BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012Amazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014
(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014
(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014Amazon Web Services
 
(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...
(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...
(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...Amazon Web Services
 
(SEC308) Wrangling Security Events In The Cloud
(SEC308) Wrangling Security Events In The Cloud(SEC308) Wrangling Security Events In The Cloud
(SEC308) Wrangling Security Events In The CloudAmazon Web Services
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAmazon Web Services
 

Viewers also liked (20)

Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
HPC in AWS - Technical Workshop
HPC in AWS - Technical WorkshopHPC in AWS - Technical Workshop
HPC in AWS - Technical Workshop
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
 
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
AWS Webcast - The Business Value of Running SAP Solutions on the AWS Cloud (D...
 
David LeBauer PEcAn
David LeBauer PEcAnDavid LeBauer PEcAn
David LeBauer PEcAn
 
BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012
BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012
BDT301 High Performance Computing in the Cloud - AWS re: Invent 2012
 
David Kelly SWIFT
David Kelly SWIFTDavid Kelly SWIFT
David Kelly SWIFT
 
Fermilab aws on demand
Fermilab aws on demandFermilab aws on demand
Fermilab aws on demand
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
HPC on AWS
HPC on AWSHPC on AWS
HPC on AWS
 
Aon Cloud
Aon Cloud Aon Cloud
Aon Cloud
 
(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014
(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014
(MBL310) Workshop: Build iOS Apps Using AWS Mobile Services | AWS re:Invent 2014
 
(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...
(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...
(ENT214) Flying Through Airport Security Using a Multiregion, Managed Solutio...
 
(SEC308) Wrangling Security Events In The Cloud
(SEC308) Wrangling Security Events In The Cloud(SEC308) Wrangling Security Events In The Cloud
(SEC308) Wrangling Security Events In The Cloud
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for Government
 

Similar to HPC in the Cloud

OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 
Kognitio - an overview
Kognitio - an overviewKognitio - an overview
Kognitio - an overviewKognitio
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudAmazon Web Services
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019Amazon Web Services
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...Amazon Web Services
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarKamesh Pemmaraju
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSAmazon Web Services LATAM
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMicrosoft Tech Community
 
RTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACIRTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACIJoel W. King
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E... Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...ShapeBlue
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAmazon Web Services
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph Community
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to KubernetesVishal Biyani
 

Similar to HPC in the Cloud (20)

OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Kognitio - an overview
Kognitio - an overviewKognitio - an overview
Kognitio - an overview
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azure
 
RTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACIRTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACI
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E... Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 

HPC in the Cloud

  • 1. Cloud HPC at AWS Dr. Jeffrey Layton, Principal Architect - HPC November, 2015
  • 3. How is AWS used for Scientific Computing? • HPC for Engineering and Simulation • High Throughput Computing (HTC) for Data Intensive Analytics • Hybrid Supercomputing Centers • Collaborative Research Environments • Citizen Science • Science-as-a-Service • Machine Learning
  • 4. Why do researchers love using AWS? • Time to Science – Access to research infrastructure in minutes • Low Cost – Pay as you go computing • Elastic – Easily add or remove capacity • Globally Accessible – Easily Collaborate with researchers around the world • Secure – A collection of tools to protect data and privacy • Scalable – Access to effectively limitless capacity
  • 5. Why does AWS care about Scientific Computing? • We want to improve our world by accelerating the pace of scientific discovery • It is a great application of AWS with a broad customer base • The scientific community helps us innovate on behalf of all customers – Streaming data processing and analytics – Exabyte scale data management solutions and exaflop scale computing – Collaborative research tools and techniques – New AWS regions – Significant advances in low-power compute, storage, and data centers – Efficiencies which lower our costs and therefore pricing for all customers
  • 6. Peering with all global research networks • Internet2 • aarnet • GEANT • Sinet • ESnet • Pacific Gigapop
  • 7. Public Datasets • Landsat • NEX • 1000 Genomes Project • Human Microbiome Project
  • 8. HPC
  • 9. Why Cloud for HPC? • Scalability – If you need to run on lots of nodes, just spin them up – If you don’t need nodes, turn them off (and don’t pay for them) • Time to Research – Usually on-prem HPC resources are centralized (shared) – Researchers like to have their own nodes when they need them • World-Wide Collaboration – Share data via the cloud • Latest Technology • Can save $$$ • Different kinds of nodes (instances)
  • 11. AWS HPC Architecture – Phases of Deployment • Fork lift – Make it look like on-premise • Cloud “port” – Adapt to cloud features • Autoscaling • Spot • Born in the Cloud • Rethink application You must think in “cloud” You cannot think in “on-prem” and transpose You must think in “cloud” Do you think you can do that Mr. Gant?
  • 12. AWS HPC Architecture Master Node Compute Node Compute Node Compute Node Compute Node Storage (NFS, Parallel) Master Instance Compute Instance Compute Instance Compute Instance Compute Instance Storage (NFS, Parallel) On-Premise AWS Cloud Compute Instance Compute Instance
  • 13. Architecture points • Why be limited to the number of nodes in your on-prem cluster? – Cloud allows to scale up and down as needed • Why limit yourself to a “single” cluster for all users? – Why not give each user their own cluster? – They can scale up and down as needed • Leave your data in the cloud – Compute on it as needed – Share it as needed – Life cycle control – Visualization
  • 15. The Hidden Cost of Queues Conflicting goals • HPC users seek fastest possible time-to-results • Simulations are not steady-state workloads • IT support team seeks highest possible utilization Result: • The job queue becomes the capacity buffer • Job completion times are hard to predict • Users are frustrated and run fewer jobs ?
  • 16. On AWS, deploy multiple clusters running at the same time and match the architectures to the jobs
  • 17. Example: TACC Portal – 4/6/2015 Need capacity Too much capacity Too much capacity
  • 18. Example: NERSC Portal – 4/30/2015 Need more capacity!! Edison – 4:1 backlog Hopper – 2:1 backlog Carver – 1.5:1 backlog
  • 19. Other Stats: • XSEDE: – In 2012, ~32% of jobs only used 1 core • 72% of all jobs used 16 cores or less (single c4 instance) • ECMWF (European weather forecasting) – 82% of all jobs are either single node or single-core
  • 20. Spot is the Bomb!
  • 21. Multiple Pricing Models Reserved Make a low, one-time payment and receive a significant discount on the hourly charge For committed utilization Free Tier Get Started on AWS with free usage & no commitment For POCs and getting started On-Demand Pay for compute capacity by the hour with no long-term commitments For spiky workloads, or to define needs Spot Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand For time-insensitive or transient workloads AWS Spot is a game-changer for HPC
  • 22. Quick Spot Comparison: • Compare Most Expensive “Spot” to least expensive “On- demand” • Master Node: – c4.8xlarge – 2x gp2 1TB EBS volumes – On-Demand • us-east - $1.856/hour • us-west-1 - $2.208/hour • Compute Nodes – c4.8xlarge • On-demand us-east - $1.856/hour • Spot (us-west-1) - $0.28/hour
  • 23. Spot vs. On-Demand 4 compute nodes, 2 hours • On-Demand, us-east – $19.13 • Spot (us-west-1) – $7.22 • Ratio: 2.64 16 compute nodes, 32 hours • On-Demand, us-east – $1,018.77 • Spot (us-west-1) – $223.11 • Ratio: 4.57
  • 25. MIT STARcluster - an HPC cluster in minutes http://star.mit.edu/cluster/ StarCluster is a utility for creating and managing distributed computing clusters hosted on Amazon's Elastic Compute Cloud. It uses Amazon's EC2 API to create and destroy clusters of Linux virtual machines on demand. It’s an easy-to-use and extensible cluster computing toolkit for the cloud. 15 minutes http://bit.ly/starclusterArticle
  • 26. Bright Cluster Manager http://www.brightcomputing.com Bright cluster manager is an established very popular HPC cluster management platform that can simultaneously manage both on-premises clusters as well as infrastructure in the cloud - all using the same system images. Bright has offices in the UK, Netherlands (HQ) and US.
  • 27. cfnCluster - provision an HPC cluster in minutes #cfncluster https://github.com/awslabs/cfncluster cfncluster is a sample code framework that deploys and maintains clusters on AWS. It is reasonably agnostic to what the cluster is for and can easily be extended to support different frameworks. The CLI is stateless, everything is done using CloudFormation or resources within AWS. 10 minutes
  • 28. Infrastructure as code #cfncluster The creation process might take a few minutes (maybe up to 5 mins or so, depending on how you configured it. Because the API to Cloud Formation (the service that does all the orchestration) is asynchronous, we can kill the terminal session if we wanted to and watch the whole show from the AWS console (where you’ll find it all under the “Cloud Formation”dashboard in the events tab for this stack. $ cfnCluster create boof-cluster Starting: boof-cluster Status: cfncluster-boof-cluster - CREATE_COMPLETE Output:"MasterPrivateIP"="10.0.0.17" Output:"MasterPublicIP"="54.66.174.113" Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/" Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/"
  • 29. Yes, it’s a real HPC cluster #cfncluster Now you have a cluster, probably running CentOS 6.x, with Sun Grid Engine as a default scheduler, and openMPI and a bunch of other stuff installed. You also have a shared filesystem in /shared and an autoscaling group ready to expand the number of compute nodes in the cluster when the existing ones get busy. You can customize quite a lot via the .cfncluster/config file - check out the comments. arthur ~ [26] $ cfnCluster create boof-cluster Starting: boof-cluster Status: cfncluster-boof-cluster - CREATE_COMPLETE Output:"MasterPrivateIP"="10.0.0.17" Output:"MasterPublicIP"="54.66.174.113" Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/" Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/" arthur ~ [27] $ ssh ec2-user@54.66.174.113 The authenticity of host '54.66.174.113 (54.66.174.113)' can't be established. RSA key fingerprint is 45:3e:17:76:1d:01:13:d8:d4:40:1a:74:91:77:73:31. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '54.66.174.113' (RSA) to the list of known hosts. [ec2-user@ip-10-0-0-17 ~]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 10185764 7022736 2639040 73% / tmpfs 509312 0 509312 0% /dev/shm /dev/xvdf 20961280 32928 20928352 1% /shared [ec2-user@ip-10-0-0-17 ~]$ qhost HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - ip-10-0-0-136 lx-amd64 8 1 4 8 - 14.6G - 1024.0M - ip-10-0-0-154 lx-amd64 8 1 4 8 - 14.6G - 1024.0M - [ec2-user@ip-10-0-0-17 ~]$ qstat [ec2-user@ip-10-0-0-17 ~]$ [ec2-user@ip-10-0-0-17 ~]$ ed hw.qsub hw.qsub: No such file or directory a #!/bin/bash # #$ -cwd #$ -j y #$ -pe mpi 2 #$ -S /bin/bash # module load openmpi-x86_64 mpirun -np 2 hostname . w 110 q [ec2-user@ip-10-0-0-17 ~]$ ll total 4 -rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub [ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub Your job 1 ("hw.qsub") has been submitted [ec2-user@ip-10-0-0-17 ~]$ [ec2-user@ip-10-0-0-17 ~]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID --------------------------------------------------------------------------- --------------------- 1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25 all.q@ip- 10-0-0-44.ap-southeas 2 [ec2-user@ip-10-0-0-17 ~]$ qstat [ec2-user@ip-10-0-0-17 ~]$ ls -l total 8 -rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub -rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1 [ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1 ip-10-0-0-136 ip-10-0-0-154 [ec2-user@ip-10-0-0-17 ~]$
  • 30. System-wide Upgrade from Ivy Bridge to Haswell #cfncluster Yes, really :-) $ ed ~/.cfncluster/config /compute_instance_type/ compute_instance_type = c3.8xLarge s/c3/c4/p compute_instance_type = c4.8xLarge w 949 $ cfncluster update boof-cluster Downgrading is just as easy. Honest.
  • 31. Config options to explore … #cfncluster Many options, but the most interesting ones immediately are: # (defaults to t2.micro for default template) compute_instance_type = t2.micro # Master Server EC2 instance type # (defaults to t2.micro for default template #master_instance_type = t2.micro # Inital number of EC2 instances to launch as compute nodes in the cluster. # (defaults to 2 for default template) #initial_queue_size = 1 # Maximum number of EC2 instances that can be launched in the cluster. # (defaults to 10 for the default template) #max_queue_size = 10 # Boolean flag to set autoscaling group to maintain initial size and scale back # (defaults to false for the default template) #maintain_initial_size = true # Cluster scheduler # (defaults to sge for the default template) scheduler = sge # Type of cluster to launch i.e. ondemand or spot # (defaults to ondemand for the default template) #cluster_type = ondemand # Spot price for the ComputeFleet #spot_price = 0.00 # Cluster placement group. This placement group must already exist. # (defaults to NONE for the default template) #placement_group = NONE t2.micro is tiny c3.4xlarge might be more interesting … Min & Max size of your cluster. Whether to fall back when things get quiet Also can use ‘openlava’ or ‘torque’ Explore the SPOT market if you want to save money :-) A placement group will provision your instances very close to each other on the network.
  • 33. C3 Instance Cluster* 484 TFLOPS Making it the 64th fastest supercomputer in the world *Representing a tiny fraction of total AWS compute capacity
  • 34. The Problem for Cancer Drug Design: • Cancer researcher needed 50,000 cores, (not available in-house) The options they didn’t choose: • Buy infrastructure: Spend Millions, wait 6 months • Spend months writing software “We contacted our friends at Cycle Computing, AWS, … to create a system that was fast, extremely secure, … inexpensive, and easy to use.” Accelerating Science
  • 35. Final Solution: • 3 new compounds – 40 years of computing – 10,466 servers World-Wide • $44M cluster for 8 hours for $4,362 – Multiple AZ’s – Multiple Regions – Automated bidding – Optimized orchestration
  • 36. University of Southern California • USC Chemistry Professor Dr. Mark Thompson • “Solar energy has the potential to replace some of our dependence on fossil fuels, but only if the solar panels can be made very inexpensively and have reasonable to high efficiencies. Organic solar cells have this potential.”
  • 37. Challenge: • Examine possible organic compounds for producing solar energy – Computational testing of 205,000 compounds • Requires 2,312,959 core-hours – (264 compute years) • $68M on-premise system
  • 38. Solution: • CycleServer from Cycle Computing • 16,788 Spot Instances • 156,314 cores – Average of 9.3 cores per instance – 1.21 PFLOPS (Rpeak)
  • 41. What Does Scale Mean in the Cloud? 18 hours 205,000 materials analyzed 156,314 AWS Spot cores at peak 2.3M core-hours Total spending: $33K (Under 1.5 cents per core-hour)
  • 42. Summary: • 205,000 molecules • 264 years of computing • Done in 18 hours on $68M system • Cost only $33,000
  • 43. NASA Head in the Clouds Project • Project Goal – Using NGA data to estimate tree and bush biomass over the entire arid and semi-arid zone on the south side of the Sahara • Project Summary – Estimate carbon stored in trees and bushes in arid and semi-arid south Sahara – Establish carbon baseline for later research on expected CO2 uptake on the south side of the Sahar • Principal Investigators – Dr. Compton J. Tucker, NASA Goddard Space Flight Center – Dr. Paul Morin, University of Minnesota • Participants: – NASA GSFC, AWS, Intel
  • 44. Existing Sub-Saharan Arid and Semi-arid Sub-meter Commercial Imagery 9600 Strips (~80TB) to be delivered to GSFC ~1600 strips (~20TB) at GSFC Area Of Interest (AOI) for Sub-Saharan Arid and Semi-arid Africa
  • 45. The DigtalGlobe Constellation The Entire Archive is Licensed to the USG Geoeye Quickbird Ikonos Worldview 1 Worldview 2 Worldview 3 (Available Q1 2015)
  • 46. Panchromatic & Multi-spectral Mapping at the 40 & 50 cm scale
  • 47. First Phase Results • Approximately 1/3 of data processed • 200 Spot instances • 6 hours of processing • Run in us-west-2 region – Carbon Neutral – “Helping the planet not harming it” • $80
  • 49. AWS pricing • Three ways to pay: – On-Demand • You can start an instance anytime you want • Most expensive – Reserved Instances • Can have a significant discount (up to 75%) compared to On-Demand • Reserved Instances provide you with a capacity reservation, so you can have confidence that you will be able to launch the instances you have reserved when you need them – Spot • Spot Instances enable you to bid for unused Amazon EC2 capacity • Instances are charged the Spot Price, which is set by Amazon EC2 and fluctuates periodically depending on the supply of and demand for Spot Instance capacity