SlideShare a Scribd company logo
1 of 95
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect
Anh Tran – Senior HPC Specialized Solution Architect
Monday, August 27, 2018
High Performance Computing
on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Global Infrastructure
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Over 100 Global CloudFront PoPs
AWS Global Infrastructure
Regions
Amazon Global
Network
• Redundant 100GbE network
• Redundant private capacity
between all Regions except China
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Global Infrastructure
18 Regions – 55 Availability Zones – *119 Points of Presence
Region & Number of Availability Zones
US West EU
Oregon (3) Ireland (3)
Northern California (3) Frankfurt (3)
London (3)
US East Paris (3)
N. Virginia (6), Ohio (3)
Asia Pacific
Canada Singapore (3)
Central (2) Sydney (3), Tokyo (4),
Seoul (2), Mumbai (2)
GovCloud US-West (3)
China
South America Beijing (2)
São Paulo (3) Ningxia (2)
Announced Regions
Bahrain, Hong Kong, SAR(China), GovCloud
(US-East)*103 Edge Locations and 11 Regional Edge Caches
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why HPC on AWS?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Running HPC Workloads Everyday
 Logistics
 Machine learning
 Data Center, network, and
server design
 Consumer product design
 Robotics
 Semiconductor design
 Retail and financial analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tightly Coupled
Parallel
Computing
Loosely Coupled
Parallel
Computing
Accelerated
Computing
Visualization and
Interpretation
High Performance
Data Storage and
Analytics
Scale
EC2 Spot
Pricing
Early Access to
Technology
Choice Performance
Derive unique
insights with AI/ML
Skip the Queue View results
instantly
AWS Advantages for HPC Workload Types
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why HPC on AWS - Multiple Clusters
$ qsub –q monolith iwait.sh
$ qsub dev.sh
$ qsub prod.sh
$ qsub critical.sh
$ qsub bigrun.sh
On-Prem
Launch clusters by group, user,
application – no more waiting!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building an HPC Infrastructure in
AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Understanding the Drivers
What are the motivations to use Cloud computing?
How running on AWS would be different from on-premises?
What would you need to launch a PoC on AWS today?
What are the requirements for your application?
Do you need to visualize your data?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Solutions
Storage
EBS EFS
S3
Networking
Enhanced
Networking
Placement
Groups
Automation &
Orchestration
AWS Batch
CfnCluster
NICE EnginFrame
Visualization
NICE DCV
Appstream 2.0
Compute
EC2 Instance
EC2 Spot
Auto Scaling
Accelerated
Compute
FPGA
GPU
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EC2 Instances
General
purpose
Dense
storage
Compute
optimized
FPGA
GPU
Compute
Storage
optimized
Graphics
intensive
Memory
optimized
High
I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4
High
I/O
General
purpose
burstable Direct access to
physical server
resources
Optimize the price/performance of your HPC Workloads with the
widest range of compute instances
C5DM5D R5 R5D
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EFS Amazon EBS
Amazon EC2
Instance Store
Amazon
S3 / S3-IA
Amazon Glacier
Object
Data Transfer
AWS Direct
Connect
ISV
Connectors
Amazon
Kinesis
Firehose
Storage
Gateway
S3 Transfer
Acceleration
AWS Storage is a Platform
AWS
Snowball
Amazon
CloudFront
Internet/
VPN
BlockFile
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network Performance
AWS Proprietary Network, 10Gbps & 25Gbps
 Highest performance in largest EC2 instance sizes
 Full bi-section bandwidth in Placement Groups, with no network
oversubscription
Enhanced Networking
 Over 1M PPS performance, reduced instance-to-instance
latencies, more consistent network performance
EC2 to S3
 Traffic to and from S3 can now take advantage of up to 25 Gbps
of bandwidth
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration
AWS Batch
Managed
AWS Lambda CfnCluster
Un-Managed
Traditional
Scheduler
AWS Step
Functions
Application Services
Amazon SWF
Fully-managed services
Run large-scale compute
workloads or simple
functions
Focus on your jobs and
their resources instead of
the infrastructure
Quickly deploy a cluster
using third-party schedulers
Bring your own scheduler
or use AWS Marketplace
solutions
Design and orchestrate
workflows, with support for
branching and callouts to
other AWS services.
Easily integrated with
AWS Batch, AWS
Lambda…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Graphics and Collaboration with DCV and AppStream
Pre-and post processing as well as HPC
Use GPUs in the cloud for remote
rendering and remote desktops
Collaborating Securely
Encrypt the data in flight and at rest
Manage your own keys and credentials
Deliver pixels to your collaborators, not the
actual data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deploying HPC Systems on AWS
3D GRAPHICS VIRTUAL WORKSTATION
LICENSE MANAGERS AND CLUSTER
HEAD NODES WITH JOB SCHEDULERS
CLOUD-BASED, AUTO-SCALING HPC CLUSTERS
SHARED FILE
STORAGE
STORAGE CACHE
Amazon S3
and Amazon Glacier
ON-PREMISES
HPC RESOURCES
Corporate Datacenter
AWS SNOWBALL
AWS DIRECT
CONNECT
THIN - NO LOCAL DATA
-
OR ZERO CLIENT
APPSTREAM 2.0
AWS BATCH
On AWS, secure and
well-optimized HPC
clusters can be
automatically created,
operated, and torn down
in a matter of minutes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Use Cases
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Several Kinds of HPC Workloads
Data Light
Minimal
requirements
for high
performance
storage
Data Heavy
Benefits from
access to high
performance
storage
Clustered (Tightly coupled)
Distributed / Grid (Loosely coupled)
• Fluid dynamics
• Weather forecasting
• Materials simulations
• Crash simulations
• Risk simulations
• Molecular modeling
• Contextual search
• Logistics simulations
• Animation and VFX
• Semiconductor verification
• Image processing/GIS
• Genomics
• Seismic processing
• Metagenomics
• Astrophysics
• Deep learning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Grids in Financial Services
“Using AWS helps us
reduce a 10-day
process to 10 minutes.
That’s transformative: it
broadens our ability to
discover.”
Peter Phillips
Managing Director
Aon Benfield Securities
Using GPU acceleration
The Challenge
 Spinning up up large numbers of GPUs quickly and inexpensively to
meet ABSI’s customers financial modeling & reporting needs
 ABSI uses proprietary algorithms (Monte Carlo simulations) running
millions of times
The Solution
 ABSI moved its infrastructure to AWS and deprecated its co-located data
center
 ABSI built a front-end on AWS for its processing solution, automatically
running GPU instances on Amazon EC2 using EBS in an Amazon VPC for
security
The Result
 Can be as much as 500 times more efficient in terms of performance per
dollar for some clients
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC clusters in Healthcare & LifeSciences
“By spinning up a few hundred
nodes on AWS and getting results in
less than a day, our scientific
researchers have a lot more
freedom to ask questions that
weren’t even possible before. The
speed is important, but equally
important is the additional
intellectual curiosity this enables for
researchers”
Lance Smith
Associate Director of IT, Celgene
HPC on AWS for Cancer Drug Research
The Challenge
 Slower time to results due to wait times and longer times to run jobs
on fixed configurations available
 Hard to collaborate with external entities due to security and
compliance issues
 Inability to scale beyond the fixed number of cores that were
available on premises
The Solution
 The company runs many HPC workloads on hundreds of Amazon
EC2 instances and uses Amazon S3 and Amazon Glacier to store
hundreds of terabytes of genomic data
 Using Amazon VPC, AWS Access and Identity Management, AWS
Direct Connect to collaborate securely
The Result
 HPC job time reduced to hours instead of weeks
 More parallel work being achieved leading to increased productivity
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC in Design & Engineering
Boom leverages Rescale and AWS to enable supersonic travel
“Rescale’s ScaleX cloud
platform is a game-changer
for engineering. It gives
Boom computing resources
comparable to building a
large on-premise HPC
center. Rescale lets us
move fast with minimal
capital spending and
resources overhead.”
Josh Krall
CTO & Co-Founder
 Simulated vortex lift with 200M cell models on 512+ cores
 Increased simulation throughput: 100 jobs in parallel with 6x
speedup per job → 600x speedup
 Eliminated IT overhead, including server capital costs & in-house IT
and software costs
 Elastic HPC capacity and pay-as-you-go AWS clusters allow business
agility & ability to scale
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1.1M vCPUs for Machine Learning
A group of researchers
from Clemson University
achieved a remarkable
milestone while studying
topic modeling, an
important component of
machine learning
associated with natural
language processing,
breaking the record for
creating the largest high-
performance cluster in
the cloud by using more
than 1,100,000 vCPUs on
Amazon EC2 Spot
Instances running in a
single AWS region.
The graph highlights
the elastic, automatic
expansion of
resources.
Clemson took
advantage of the new
per-second billing for
EC2 instances.
The vCPU count
usage is comparable
to the core count on
the largest
supercomputers in
the world.
S3
Provisionin
g and
workflow
automation
software
S3
JOB
SCRIPT
CLOUDY
CLUSTER
APIs
LOGIN SCHEDULER
SLURM
AUTO
SCALING
SPOT FLEET
CCQ
S3
DDB VPC
https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-vcpus-ec2-spot-instances/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect
Anh Tran – Senior HPC Specialized Solution Architect
Monday, August 27, 2018
HPC on AWS Deep Dive
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compute
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
 First launched in August 2006
 M1 instance
 “One size fits all”
M1
In the past
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EC2 Instances
General
purpose
Dense
storage
Compute
optimized
FPGA
GPU
Compute
Storage
optimized
Graphics
intensive
Memory
optimized
High
I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4
High
I/O
General
purpose
burstable Direct access to
physical server
resources
Optimize the price/performance of your HPC Workloads with the
widest range of compute instances
C5DM5D R5 R5D
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance
generation
c5.9large
Instance family Instance size
Elastic Compute Cloud (EC2) Instance Naming
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance sizing
c5.18xlarge 2 x c5.9xlarge
≈
4 x c5.4xlarge
≈
8 x c5.2xlarge
≈
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hypervis
or
Management
, Security,
and
Monitoring
Storage
Customer
Instances
Network
Original EC2 Host Architecture
 All resources were
on the server
 Instance Goals:
• Security
• Performance
• Familiarity
SERVER
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hypervis
or
Management
, Security,
and
Monitoring
Storage
Customer
Instances
Network
EC2 C5 Instance
 Nearly 100% of
available compute
resources available
to customers’
workload
 Improved security
SERVER
NITRO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
C5 Instances - Intel® XEON® Scalable Processor
 Intel Skylake @ 3.0
GHz (turbo to 3.5GHz)
 Supports AVX512
 C-state controls
 Nitro System, a
combination of
dedicated hardware and
lightweight hypervisor
 Up to 25 Gbps network
AVX 512
72 vCPUs
“Skylake”
144 GiB memory
C5
12 Gbps to EBS
2X vCPUs
3X throughput
2.4X memory
C4
36 vCPUs
“Haswell”
4 Gbps to EBS
60 GiB memory
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance Considerations
Test using real-world examples
 Use large cases for testing: do
not benchmark scalability
using only small examples
MPI libraries
 Test with Intel MPI and
OpenMPI 3.0, and make use of
available tunings
Domain decomposition
 Choose number of cells per
core for either per-core
efficiency or for faster results
Network
 Use a placement group
 Enable enhanced networking
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s a Virtual CPU? (vCPU)
 A vCPU is typically an Intel hyper-threaded physical core*
 On Linux, “A” threads enumerated before “B” threads
 On Windows, threads are interleaved
 Divide vCPU count by 2 to get core count
 Cores by EC2 & RDS DB Instance type:
https://aws.amazon.com/ec2/virtualcores/
* The “T” family is special
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Disable Hyper-Threading If You Need To
 Useful for CPU heavy applications
 Use ‘lscpu’ to validate layout
 Disable Hyper-Threading without reboot
 Set grub to only initialize the first half of all threads
for cpunum in $(cat 
/sys/devices/system/cpu/cpu*/topology/thread_siblings_list | 
cut -s -d, -f2- | tr ',' 'n' | sort -un); do
echo 0 | sudo tee /sys/devices/system/cpu/cpu${cpunum}/online
done
maxcpus=64
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RHEL6 and the 2.6 Kernel
 The 2.6.32 Linux kernel was released in 2009
 Use 3.10+ kernel (Up to 40% boost!)
 Amazon Linux 13.09 or later
 Ubuntu 14.04 or later
 RHEL/Centos 7 or later
 Etc.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Networking
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network Performance
AWS Proprietary Network, 10Gbps & 25Gbps
 Highest performance in largest EC2 instance sizes
 Full bi-section bandwidth in Placement Groups, with no network
oversubscription
Enhanced Networking
 Over 1M PPS performance, reduced instance-to-instance
latencies, more consistent network performance
EC2 to S3
 Traffic to and from S3 can now take advantage of up to 25 Gbps
of bandwidth
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network Performance
 25 Gigabit & 10 Gigabit
 Measured one-way, double that for bi-directional (full duplex)
 High, Moderate, Low – instance size and EBS optimization
 Not all created equal – Test with iperf if it’s important!
 https://aws.amazon.com/premiumsupport/knowledge-
center/network-throughput-benchmark-linux-ec2/
 Use placement groups when you need high and consistent instance
to instance bandwidth
 25 Gbps to S3
 All traffic limited to 5 Gb/s when exiting EC2 (eg: VPN or DC)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hardware
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
1
23
4
5
Split driver model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Device Pass Through: Enhanced Networking
 SR-IOV allows the physical network device exposed to the instance
 Provides faster, more consistent network performance
 Higher rate of packets per second
 Requires a specialized driver, which means:
 Your instance OS needs to know about it
 EC2 needs to be told your instance can use it
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hardware
Driver Domain Guest Domain Guest Domain
VMM
NIC
Driver
Physical
CPU
Physical
Memory
SR-IOV
Network Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
1
2
3
NIC
Driver
After Enhanced Networking
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
 Next generation of enhanced networking
 Hardware checksums
 Multi-queue support
 Receive side steering
 25 Gbps in a placement group
 New open source Amazon network driver
Elastic Network Adapter
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ENA on r4, i3 and beyond
 8xlarge: consistent 10 Gbps
16xlarge: consistent 25 Gbps
 Smaller instances
 Up to 10 Gbps with baseline
 Accrue credits when network usage
below baseline
 Full bandwidth without placement
group, but do need multiple streams
 Single stream limited to 10Gbps in
placement group
 5 Gbps per stream across AZ’s, but
still 25Gbps total
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network Performance
Instance Family 25 Gbps 10 Gbps
Family supports
"Up to 10 Gbps"
m4 m4.16xlarge m4.10xlarge
m5 m5.24xlarge m5.12xlarge Yes
c5 c5.18xlarge c5.9xlarge Yes
c4 c4.8xlarge
r4 r4.16xlarge r4.8xlarge Yes
p3 p3.16xlarge p3.8xlarge Yes
p2 p2.16xlarge p2.8xlarge
g3 g3.16xlarge g3.8xlarge Yes
i3.metal i3.metal
x1 x1.32xlarge x1.16xlarge
x1e x1e.32xlarge x1e.16xlarge Yes
f1 f1.16xlarge Yes
i3 i3.16xlarge i3.8xlarge Yes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance Store
Temporary block-level storage
Physically attached to host computer
Lifetime
• Data lost when:
• drive failure
• instance stops
• instance terminates
• Data persists on reboot
Instance store data loss
prevention:
• Create RAID 1/5/6
• Move data to S3 or EBS
• Create a fault tolerant FS
XX
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance Store
Choose your instance wisely**
Supported Instances: i3, d2, f1, h1, x1 (current generation)
FS Disk servers find balance between storage and network
i3.* family NVMe SSD instance store
• I/O intensive workloads
• Up to 3.3 million IOPS at a 4 KB block
• Up to 16 GB/s sequential throughput
Virtual devices on instance are ephemeral[0-23] or nvme[0-7]
** http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance Store on the AWS Console
i2.8xlarge (8 x 800 SSD)
No additional instance volumes
Only additional EBS volumes
Size of Instance Store is not optional
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance Store on the EC2 Instance
[ec2-user@ip-172-31-16-91 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdb 202:16 0 745.2G 0 disk
xvdc 202:32 0 745.2G 0 disk
xvdd 202:48 0 745.2G 0 disk
xvde 202:64 0 745.2G 0 disk
xvdf 202:80 0 745.2G 0 disk
xvdg 202:96 0 745.2G 0 disk
xvdh 202:112 0 745.2G 0 disk
xvdi 202:128 0 745.2G 0 disk
Block device mapping for Instance Store
$ aws ec2 describe-volumes --filters Name=attachment.instance-id,Values=i-asdf1234 --out text
VOLUMES us-west-2b 2016-12-21T17:54:04.171Z False 100 8 snap-15cfb226 in-use vol-XXXXXXXX gp2
ATTACHMENTS 2016-12-21T17:54:04.000Z True /dev/xvda i-asdf1234 attached vol-XXXXXXXX
Use lsblk on the instance
$ aws ec2 describe-volumes --filters Name=attachment.instance-id,Values=i-asdf1234 --out text
Use AWS CLI to show EBS:
Shows only the EBS volumes, not instance store volumes:
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EBS – Elastic Block Storage
Two Block Storage options for EC2 Instances: EBS and Instance Store
EC2 Instance
/dev/xvda
/dev/xvdb
/dev/xvdc
Block Device Mapping Instance Store
ephemeral0
ephemeral1
vol-xxxxxxxx
vol-xxxxxxxx
/dev/xvdd
EBS Volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EBS Volume Types
General Purpose
SSD
balance price and
performance for a wide
variety of transactional data
gp2
Provisioned IOPS
SSD
latency-sensitive
transactional workloads
io1
Throughput Optimized
HDD
frequently accessed, throughput
intensive workloads
st1
Cold
HDD
less frequently
accessed data
sc1
SSD HDD
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EBS – What volume type should I use?
Solid-State Drives (SSD) Hard Disk Drives (HDD)
Volume Type
General Purpose
SSD (gp2*)
Provisioned IOPS SSD
(io1)
Throughput Optimized
HDD (st1)
Cold HDD
(sc1)
Use Cases
Most workloads,
Boot volumes, Low-
latency interactive
apps, Dev and Test
I/O Intensive, Large
database, Parallel FS
Streaming, Big data,
Data warehouses, Log
processing, Not boot
vol, Parallel FS
Lowest cost,
infrequently
accessed,
not boot vol
Volume Size 1 GiB - 16 TiB 4 GiB - 16 TiB 500 GiB - 16 TiB 500 GiB - 16 TiB
Max. IOPS**/Volume 10,000 32,000 500 250
Max.
Throughput/Volume†
160 MiB/s 500 MiB/s*** 500 MiB/s 250 MiB/s
Max. IOPS/Instance 80,000 80,000 80,000 80,000
Max.
Throughput/Instance
1,750 MiB/s 1,750 MiB/s 1,750 MiB/s 1,750 MiB/s
Dominant
Performance
Attribute
IOPS
3 IOPS per GiB
IOPS
50 IOPS per GiB
MiB/s MiB/s
*Default volume type
**gp2/io1 based on 16KiB I/O size, st1/sc1 based on 1 MiB I/O size
***An io1 volume created before 12/6/2017 will not achieve this throughput until modified in some way.
† To achieve this throughput, you must have an instance that supports it, such as r4.8xlarge or x1.32xlarge.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EBS Performance
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html - As of July 25, 2017
Instance
type
EBS-
optimized
by default
Max EBS
bandwidth
(Mbps)*
Expected
throughput
(MB/s)**
Max. IOPS
(16 KB I/O
size)**
Max Network
bandwidth
3 Year Reserved
$/Hour
(N. Virginia)
r4.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $1.600
m4.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $1.203
c5.19xlarge Yes 9,000 1,125 64,000 25Gb/s $1.928
g3.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $2.023
i3.16xlarge Yes 14,000 1,750 65,000 25 Gb/s $2.112
f1.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $5.734
p2.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $6.392
x1.32xlarge Yes 10,000 1,250 65,000 10 Gb/s $3.732
Choose the right instance:
RAID multiple EBS volumes together to achieve max performance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elastic File System (EFS)
 Fully managed file system for EC2 instances
 POSIX file system semantics
 Works with standard operating system APIs
 Sharable across thousands of instances
 Elastically grows to petabyte scale
 Delivers performance for a wide variety of workloads
 Highly available and durable
 NFS v4–based
 Accessible from on-prem servers New!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EFS Performance
File System
Size (GiB)
Baseline
Aggregate
Throughput (MiB/s)
Burst
Aggregate
Throughput
(MiB/s)
Maximum
Burst Duration
(Min/Day)
% of Time
File System
Can Burst
(Per Day)
10 0.5 100 7.2 0.5%
256 12.5 100 180 12.5%
512 25.0 100 360 25.0%
1024 50.0 100 720 50.0%
1536 75.0 150 720 50.0%
2048 100.0 200 720 50.0%
3072 150.0 300 720 50.0%
4096 200.0 400 720 50.0%
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimize HPC storage
EFS EBS + EC2 Amazon S3 Amazon Glacier
Highly available, multi-AZ,
fully managed network-
attached elastic file system.
For near-line, highly-
available storage of files in a
traditional NFS format
(NFSv4).
Create a single-AZ shared
file system using EC2 and
EBS, with third-party or
open source software (ZFS,
Weka.io, Intel Lustre, etc).
For near-line storage of files
optimized for high IOPS.
Secure, durable, highly-
scalable object storage.
Fast access, low cost.
For long-term durable
storage of data, in a readily
accessible get/put access
format.
Secure, durable, long term,
highly cost-effective object
storage.
For long-term storage and
archival of data that is
infrequently accessed.
Use for read-often,
temporary working storage
Use for high-IOPS,
temporary working storage
Primary durable and
scalable storage for critical
data
Use for long-term, lower-
cost archival of critical data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Transfer
HPC Data Flow on AWS Storage
corporate data center
Amazon
Glacier
Amazon S3
AWS Direct
Connect
ISV
Connectors
Storage
Gateway
AWS
Snowball
Internet/VPN
Ingress
Egress
Lifecycle
EC2 Instance
EBS
Instance
Store
Object, Block, File Storage
Amazon
Kinesis
Firehose
S3 Transfer
Acceleration
Amazon
CloudFront
Other Shared File
System
EFS
25 Gbps to S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Snowball
• Accelerate PBs with AWS-
provided appliances
• 100/80 TB model, global
availability
Data ingestion into AWS storage services
Amazon Kinesis Firehose
• Ingest device streams directly into
AWS data stores
AWS Direct Connect
• COLO to AWS
ISV Connectors
• CommVault
• Veritas
• etcetera
Amazon S3 Transfer Acceleration
• Move data up to 300% faster
using AWS’s private network
AWS Storage Gateway
• Instant hybrid cloud
• Up to 120 MB/s cloud upload rate
(4x improvement), and
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Transfer Acceleration
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
Typically 50%–300% faster
Change your endpoint, not your code
Global edge locations
No firewall exceptions
No client software required Internet Only
Internet and AWS
Edge connections: Constantly monitored, optimized network paths
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rio De
Janeiro
Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los
Angeles
Seattle Tokyo Singapore
Time[hrs]
500 GB upload from these edge locations to a bucket in Singapore
Public Internet
How fast is S3 transfer acceleration?
S3 Transfer Acceleration
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speed Comparison - S3 transfer acceleration
http://s3speedtest.com
 Multipart uploads from your
browser to Amazon S3 regions
 In general, the farther away
from an Amazon S3 region,
the more improvement you
can expect
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Snowball
 Fast Data Transfer
 Encryption
 Rugged and Portable
 Tamper Resistant
 End-to-End Tracking
 Secure Erasure
 Programmable
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is AWS Snowball? Petabyte-scale data transport
E-ink shipping
label
Ruggedized case
“8.5G impact”
All data encrypted
end-to-end
Rain- and dust-
resistant
Tamper-resistant
case and
electronics
80 TB
10 GE network
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Classes and Tiering on Amazon S3
Standard
• Primary data
• Big Data Analytics
• Small objects
• Temporary scratch space
• Archive data
• Deep/offline archives
• Tape vaulting replacement
• WORM-compliant data
• File sync and share
• Active Archive
• Enterprise backup
• Media transcoding
• Geo-redundancy/DR
Standard - Infrequent Access Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS storage migration expansion:
AWS Snowmobile
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Automation and Batch
Processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traditional Job Schedulers Integrate Easily
Bring your scheduler to AWS, or build your own
 IBM Platform LSF
 Univa Grid Engine
 Altair PBS Pro
 SLURM
 Design your own using AWS services
 Do you actually need a scheduler?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch
 AWS Batch dynamically provisions resources
 Plans, schedules, and executes
 No batch software to install
Focus on your applications and results!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Automation and Orchestration
Choose from several options to adapt your workloads
 CfnCluster
 AWS Batch
 AWS-NICE DCV and EnginFrame
 Build your own CloudFormation templates
 ISV offerings on Marketplace or use an SI
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Launch a Cluster in minutes
 Cluster creation usually
takes ~15 minutes
 Completely managed by
CloudFormation
$ cfncluster create mycluster
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CfnCluster Configuration Options
 Operating System
 Amazon Linux
 Centos 6
 Centos 7
 Ubuntu 14.04
 Scheduler
 Sun Grid Engine (SGE)
 PBS/Torque
 SLURM
 Storage Size & IOPS
 EBS & Instance Store
Encryption
 Scaling Speed & Limits
 Provisioning Scripts
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
License Server Management
 FLEXlm works natively on
AWS
 Each EC2 instance has a
unique hostname & hardware
address that can’t be spoofed
 Set the ENI (Network Interface) for
your license server not to
“Delete on termination”
 Allows for simple license
failover and migration
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Architecture on AWS
corporate data center
availability zone
autoscaling group
parallel
FS
local
NFS
s3
data
ingress/egress
EFS
 Three file systems: EFS,
Local NFS, and Parallel
FS
 Snapshot of EBS to s3
 Data tiering FS to s3
 AutoScaling allows for
scaling when needed
master instance
$ qsub job.sh
EBS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
POC using EFS and EBS
availability zone
S3
EFS
1. Copy Data from S3 to EBS on startup
2. Start job
3. Use EBS vols while job is running
4. Access mounted EFS directories while job is
running (/lib and /binary)
5. Record pass/fail
6. Update Data with delta
1
$ ./run_job
2
Mounted File Systems:
EBS
4
3
DynamoDB
5
6
r4.16xlarge
/scratch
/lib and /binary
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualization
availability zone
corporate data centermodeling
cluster
 Using a GPU optimized
instance and AWS-NICE
DCV to visualized results
GPU instance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
region-1a
Master Server
Auto Scaling
Compute Fleet
CloudFormation
Public Subnet
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
Private Subnet
Master Server
Auto Scaling
Compute Fleet
CloudFormation
Public Subnet
VPC NAT
gateway
Private Subnet Route Table
VPC Traffic -> Local
0.0.0.0 -> Nat Gateway
Public Subnet Route Table
VPC Traffic -> Local
0.0.0.0 -> Internet Gateway
Isolated in a Private Subnet w/ Bastian
Bastian Server
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
Private Subnet
Master Server
Auto Scaling
Compute Fleet
CloudFormation
Public Subnet
VPC NAT
gateway
Corporate Data Center
Engineer VPN Connection
Private Subnet Route Table
VPC Traffic -> Local
Corp IP Range -> VPN
0.0.0.0 -> Nat Gateway
Public Subnet Route Table
VPC Traffic -> Local
Corp IP Range -> VPN
0.0.0.0 -> Internet Gateway
Isolated with a VPN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Private Subnet
Master Server
Auto Scaling
Compute Fleet
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
CloudFormation
Corporate Data Center
Proxy Server
VPN Connection
Internet
Connection
Private Subnet Route Table
VPC Traffic -> Local
Corp IP Range -> VPN
0.0.0.0 -> VPN
Completely Private w/ VPN & Proxy
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security Overview on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Shared Responsibility Model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compliance Programs
SOC 1
Global
SOC 2 SOC 3
https://aws.amazon.com/compliance/pci-data-privacy-protection-hipaa-soc-fedramp-faqs/
United
States
Asia
Pacific
Europe
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It is always YOUR data!
 Customers choose where to place their data
 AWS regions are geographically isolated by design
 Data is not replicated to other AWS regions and does
not move unless the customer tell us to do so
 Customer always own their data, the ability to
encrypt it, move it, and delete it
AWS Customer Agreement
https://aws.amazon.com/agreement/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ubiquitous, Fully-Managed Encryption
EBS
RDS
Amazon
Redshift
S3
Amazon
Glacier
Encrypted in transit
AWS CloudTrail
IAM
Fully auditable
Restricted access
and at rest
Fully managed
keys in KMS
Imported
keys
Your KMI
Amazon
EC2
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
From this
To This
Media Destruction
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cost Optimization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EC2 Purchasing Options
On-Demand
Pay for compute capacity by
the second with no long-
term commitments
Spiky workloads, to define
needs
Reserved
Make a 1 or 3 Year commitment
and receive a significant discount
off On-Demand prices
Committed, steady-state usage
Spot
Spare EC2 capacity at savings of
up to 90% off On-Demand prices
Fault-tolerant, dev/test, time-
flexible, stateless workloads
Per Second Billing for EC2 Linux instances & EBS volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cost Optimization
Weather Forecasting and Modeling
On Demand
Spot
Reserved
Instances
Forecasting
00z, 06z, 12z, 18z
Climate
Modeling
Weather
Events
Daily Forecasts
Climate
Modeling
Hurricane
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“By using AWS Spot instances, we've been able to save 75% a month
simply by changing four lines of code. It makes perfect sense for
saving money when you're running continuous integration workloads or
pipeline processing.” - Matthew Leventi, Lead Engineer, Lyft
Why use Spot – customer examples
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
aws.amazon.com/compliance/data-center
AWS Data Centers
Take a virtual
tour of an AWS
data center
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you, and how can I help you
run HPC workloads on AWS?
aws.amazon.com/hpc

More Related Content

What's hot

Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networksi2k2 Networks (P) Ltd.
 
Aws 101 A walk-through the aws cloud (2013)
Aws 101  A walk-through the aws cloud (2013)Aws 101  A walk-through the aws cloud (2013)
Aws 101 A walk-through the aws cloud (2013)Martin Yan
 
Understand AWS Pricing
Understand AWS PricingUnderstand AWS Pricing
Understand AWS PricingLynn Langit
 
An Introduction to AWS
An Introduction to AWSAn Introduction to AWS
An Introduction to AWSIan Massingham
 
Cloud Computing and Amazon Web Services
Cloud Computing and Amazon Web ServicesCloud Computing and Amazon Web Services
Cloud Computing and Amazon Web ServicesAditya Jha
 
Optimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS CloudOptimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS CloudAmazon Web Services
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Amazon Web Services
 
AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimizationYogesh Sharma
 
Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...
Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...
Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...Amazon Web Services
 
AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019Amazon Web Services
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAmazon Web Services
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journeyAmazon Web Services
 
Introduction to Azure IaaS
Introduction to Azure IaaSIntroduction to Azure IaaS
Introduction to Azure IaaSRobert Crane
 
Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAmazon Web Services
 
How to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseHow to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseRightScale
 
Introduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingIntroduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingAmazon Web Services
 

What's hot (20)

Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networks
 
Aws 101 A walk-through the aws cloud (2013)
Aws 101  A walk-through the aws cloud (2013)Aws 101  A walk-through the aws cloud (2013)
Aws 101 A walk-through the aws cloud (2013)
 
Understand AWS Pricing
Understand AWS PricingUnderstand AWS Pricing
Understand AWS Pricing
 
An Introduction to AWS
An Introduction to AWSAn Introduction to AWS
An Introduction to AWS
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Cloud Computing and Amazon Web Services
Cloud Computing and Amazon Web ServicesCloud Computing and Amazon Web Services
Cloud Computing and Amazon Web Services
 
Optimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS CloudOptimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS Cloud
 
Cost Optimisation
Cost OptimisationCost Optimisation
Cost Optimisation
 
Cost Optimization on AWS
Cost Optimization on AWSCost Optimization on AWS
Cost Optimization on AWS
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
 
Architecting for AWS
Architecting for AWSArchitecting for AWS
Architecting for AWS
 
AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimization
 
Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...
Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...
Reduce Costs and Build a Strong Operational Foundation with the AWS Migration...
 
AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices Webinar
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journey
 
Introduction to Azure IaaS
Introduction to Azure IaaSIntroduction to Azure IaaS
Introduction to Azure IaaS
 
Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdf
 
How to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseHow to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your Enterprise
 
Introduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud ComputingIntroduction to AWS Services and Cloud Computing
Introduction to AWS Services and Cloud Computing
 

Similar to High Performance Computing on AWS

AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAmazon Web Services
 
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)Amazon Web Services
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAmazon Web Services
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...Amazon Web Services
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSAmazon Web Services
 
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)Amazon Web Services
 
High-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationHigh-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationAmazon Web Services
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Amazon Web Services
 
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...Amazon Web Services
 
Standard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyStandard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyAmazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...Amazon Web Services
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerAmazon Web Services
 
Modernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWSModernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWSAmazon Web Services
 
Design, Build, and Modernize Your Web Applications with AWS
 Design, Build, and Modernize Your Web Applications with AWS Design, Build, and Modernize Your Web Applications with AWS
Design, Build, and Modernize Your Web Applications with AWSDonnie Prakoso
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Amazon Web Services
 
Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...
Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...
Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...Amazon Web Services
 
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...Amazon Web Services
 

Similar to High Performance Computing on AWS (20)

AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWS
 
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWS
 
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
 
High-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationHigh-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-Simulation
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
 
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
 
Standard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyStandard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud Journey
 
What Can HPC on AWS Do?
What Can HPC on AWS Do?What Can HPC on AWS Do?
What Can HPC on AWS Do?
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model Server
 
Modernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWSModernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWS
 
Design, Build, and Modernize Your Web Applications with AWS
 Design, Build, and Modernize Your Web Applications with AWS Design, Build, and Modernize Your Web Applications with AWS
Design, Build, and Modernize Your Web Applications with AWS
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
 
Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...
Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...
Use HPC on AWS for Physics-Based Simulation, ML, and Statistics in CAE (CMP32...
 
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

High Performance Computing on AWS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect Anh Tran – Senior HPC Specialized Solution Architect Monday, August 27, 2018 High Performance Computing on AWS
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Over 100 Global CloudFront PoPs AWS Global Infrastructure Regions Amazon Global Network • Redundant 100GbE network • Redundant private capacity between all Regions except China
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure 18 Regions – 55 Availability Zones – *119 Points of Presence Region & Number of Availability Zones US West EU Oregon (3) Ireland (3) Northern California (3) Frankfurt (3) London (3) US East Paris (3) N. Virginia (6), Ohio (3) Asia Pacific Canada Singapore (3) Central (2) Sydney (3), Tokyo (4), Seoul (2), Mumbai (2) GovCloud US-West (3) China South America Beijing (2) São Paulo (3) Ningxia (2) Announced Regions Bahrain, Hong Kong, SAR(China), GovCloud (US-East)*103 Edge Locations and 11 Regional Edge Caches
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why HPC on AWS?
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Running HPC Workloads Everyday  Logistics  Machine learning  Data Center, network, and server design  Consumer product design  Robotics  Semiconductor design  Retail and financial analytics
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tightly Coupled Parallel Computing Loosely Coupled Parallel Computing Accelerated Computing Visualization and Interpretation High Performance Data Storage and Analytics Scale EC2 Spot Pricing Early Access to Technology Choice Performance Derive unique insights with AI/ML Skip the Queue View results instantly AWS Advantages for HPC Workload Types
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why HPC on AWS - Multiple Clusters $ qsub –q monolith iwait.sh $ qsub dev.sh $ qsub prod.sh $ qsub critical.sh $ qsub bigrun.sh On-Prem Launch clusters by group, user, application – no more waiting!
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building an HPC Infrastructure in AWS
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Understanding the Drivers What are the motivations to use Cloud computing? How running on AWS would be different from on-premises? What would you need to launch a PoC on AWS today? What are the requirements for your application? Do you need to visualize your data?
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Solutions Storage EBS EFS S3 Networking Enhanced Networking Placement Groups Automation & Orchestration AWS Batch CfnCluster NICE EnginFrame Visualization NICE DCV Appstream 2.0 Compute EC2 Instance EC2 Spot Auto Scaling Accelerated Compute FPGA GPU
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 Instances General purpose Dense storage Compute optimized FPGA GPU Compute Storage optimized Graphics intensive Memory optimized High I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4 High I/O General purpose burstable Direct access to physical server resources Optimize the price/performance of your HPC Workloads with the widest range of compute instances C5DM5D R5 R5D
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EFS Amazon EBS Amazon EC2 Instance Store Amazon S3 / S3-IA Amazon Glacier Object Data Transfer AWS Direct Connect ISV Connectors Amazon Kinesis Firehose Storage Gateway S3 Transfer Acceleration AWS Storage is a Platform AWS Snowball Amazon CloudFront Internet/ VPN BlockFile
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network Performance AWS Proprietary Network, 10Gbps & 25Gbps  Highest performance in largest EC2 instance sizes  Full bi-section bandwidth in Placement Groups, with no network oversubscription Enhanced Networking  Over 1M PPS performance, reduced instance-to-instance latencies, more consistent network performance EC2 to S3  Traffic to and from S3 can now take advantage of up to 25 Gbps of bandwidth
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration AWS Batch Managed AWS Lambda CfnCluster Un-Managed Traditional Scheduler AWS Step Functions Application Services Amazon SWF Fully-managed services Run large-scale compute workloads or simple functions Focus on your jobs and their resources instead of the infrastructure Quickly deploy a cluster using third-party schedulers Bring your own scheduler or use AWS Marketplace solutions Design and orchestrate workflows, with support for branching and callouts to other AWS services. Easily integrated with AWS Batch, AWS Lambda…
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Graphics and Collaboration with DCV and AppStream Pre-and post processing as well as HPC Use GPUs in the cloud for remote rendering and remote desktops Collaborating Securely Encrypt the data in flight and at rest Manage your own keys and credentials Deliver pixels to your collaborators, not the actual data
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deploying HPC Systems on AWS 3D GRAPHICS VIRTUAL WORKSTATION LICENSE MANAGERS AND CLUSTER HEAD NODES WITH JOB SCHEDULERS CLOUD-BASED, AUTO-SCALING HPC CLUSTERS SHARED FILE STORAGE STORAGE CACHE Amazon S3 and Amazon Glacier ON-PREMISES HPC RESOURCES Corporate Datacenter AWS SNOWBALL AWS DIRECT CONNECT THIN - NO LOCAL DATA - OR ZERO CLIENT APPSTREAM 2.0 AWS BATCH On AWS, secure and well-optimized HPC clusters can be automatically created, operated, and torn down in a matter of minutes
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Use Cases
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Several Kinds of HPC Workloads Data Light Minimal requirements for high performance storage Data Heavy Benefits from access to high performance storage Clustered (Tightly coupled) Distributed / Grid (Loosely coupled) • Fluid dynamics • Weather forecasting • Materials simulations • Crash simulations • Risk simulations • Molecular modeling • Contextual search • Logistics simulations • Animation and VFX • Semiconductor verification • Image processing/GIS • Genomics • Seismic processing • Metagenomics • Astrophysics • Deep learning
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Grids in Financial Services “Using AWS helps us reduce a 10-day process to 10 minutes. That’s transformative: it broadens our ability to discover.” Peter Phillips Managing Director Aon Benfield Securities Using GPU acceleration The Challenge  Spinning up up large numbers of GPUs quickly and inexpensively to meet ABSI’s customers financial modeling & reporting needs  ABSI uses proprietary algorithms (Monte Carlo simulations) running millions of times The Solution  ABSI moved its infrastructure to AWS and deprecated its co-located data center  ABSI built a front-end on AWS for its processing solution, automatically running GPU instances on Amazon EC2 using EBS in an Amazon VPC for security The Result  Can be as much as 500 times more efficient in terms of performance per dollar for some clients
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC clusters in Healthcare & LifeSciences “By spinning up a few hundred nodes on AWS and getting results in less than a day, our scientific researchers have a lot more freedom to ask questions that weren’t even possible before. The speed is important, but equally important is the additional intellectual curiosity this enables for researchers” Lance Smith Associate Director of IT, Celgene HPC on AWS for Cancer Drug Research The Challenge  Slower time to results due to wait times and longer times to run jobs on fixed configurations available  Hard to collaborate with external entities due to security and compliance issues  Inability to scale beyond the fixed number of cores that were available on premises The Solution  The company runs many HPC workloads on hundreds of Amazon EC2 instances and uses Amazon S3 and Amazon Glacier to store hundreds of terabytes of genomic data  Using Amazon VPC, AWS Access and Identity Management, AWS Direct Connect to collaborate securely The Result  HPC job time reduced to hours instead of weeks  More parallel work being achieved leading to increased productivity
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC in Design & Engineering Boom leverages Rescale and AWS to enable supersonic travel “Rescale’s ScaleX cloud platform is a game-changer for engineering. It gives Boom computing resources comparable to building a large on-premise HPC center. Rescale lets us move fast with minimal capital spending and resources overhead.” Josh Krall CTO & Co-Founder  Simulated vortex lift with 200M cell models on 512+ cores  Increased simulation throughput: 100 jobs in parallel with 6x speedup per job → 600x speedup  Eliminated IT overhead, including server capital costs & in-house IT and software costs  Elastic HPC capacity and pay-as-you-go AWS clusters allow business agility & ability to scale
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1.1M vCPUs for Machine Learning A group of researchers from Clemson University achieved a remarkable milestone while studying topic modeling, an important component of machine learning associated with natural language processing, breaking the record for creating the largest high- performance cluster in the cloud by using more than 1,100,000 vCPUs on Amazon EC2 Spot Instances running in a single AWS region. The graph highlights the elastic, automatic expansion of resources. Clemson took advantage of the new per-second billing for EC2 instances. The vCPU count usage is comparable to the core count on the largest supercomputers in the world. S3 Provisionin g and workflow automation software S3 JOB SCRIPT CLOUDY CLUSTER APIs LOGIN SCHEDULER SLURM AUTO SCALING SPOT FLEET CCQ S3 DDB VPC https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-vcpus-ec2-spot-instances/
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect Anh Tran – Senior HPC Specialized Solution Architect Monday, August 27, 2018 HPC on AWS Deep Dive
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Compute
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.  First launched in August 2006  M1 instance  “One size fits all” M1 In the past
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 Instances General purpose Dense storage Compute optimized FPGA GPU Compute Storage optimized Graphics intensive Memory optimized High I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4 High I/O General purpose burstable Direct access to physical server resources Optimize the price/performance of your HPC Workloads with the widest range of compute instances C5DM5D R5 R5D
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance generation c5.9large Instance family Instance size Elastic Compute Cloud (EC2) Instance Naming
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance sizing c5.18xlarge 2 x c5.9xlarge ≈ 4 x c5.4xlarge ≈ 8 x c5.2xlarge ≈
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hypervis or Management , Security, and Monitoring Storage Customer Instances Network Original EC2 Host Architecture  All resources were on the server  Instance Goals: • Security • Performance • Familiarity SERVER
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hypervis or Management , Security, and Monitoring Storage Customer Instances Network EC2 C5 Instance  Nearly 100% of available compute resources available to customers’ workload  Improved security SERVER NITRO
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. C5 Instances - Intel® XEON® Scalable Processor  Intel Skylake @ 3.0 GHz (turbo to 3.5GHz)  Supports AVX512  C-state controls  Nitro System, a combination of dedicated hardware and lightweight hypervisor  Up to 25 Gbps network AVX 512 72 vCPUs “Skylake” 144 GiB memory C5 12 Gbps to EBS 2X vCPUs 3X throughput 2.4X memory C4 36 vCPUs “Haswell” 4 Gbps to EBS 60 GiB memory
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance Considerations Test using real-world examples  Use large cases for testing: do not benchmark scalability using only small examples MPI libraries  Test with Intel MPI and OpenMPI 3.0, and make use of available tunings Domain decomposition  Choose number of cells per core for either per-core efficiency or for faster results Network  Use a placement group  Enable enhanced networking
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s a Virtual CPU? (vCPU)  A vCPU is typically an Intel hyper-threaded physical core*  On Linux, “A” threads enumerated before “B” threads  On Windows, threads are interleaved  Divide vCPU count by 2 to get core count  Cores by EC2 & RDS DB Instance type: https://aws.amazon.com/ec2/virtualcores/ * The “T” family is special
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Disable Hyper-Threading If You Need To  Useful for CPU heavy applications  Use ‘lscpu’ to validate layout  Disable Hyper-Threading without reboot  Set grub to only initialize the first half of all threads for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' 'n' | sort -un); do echo 0 | sudo tee /sys/devices/system/cpu/cpu${cpunum}/online done maxcpus=64
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RHEL6 and the 2.6 Kernel  The 2.6.32 Linux kernel was released in 2009  Use 3.10+ kernel (Up to 40% boost!)  Amazon Linux 13.09 or later  Ubuntu 14.04 or later  RHEL/Centos 7 or later  Etc.
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Networking
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network Performance AWS Proprietary Network, 10Gbps & 25Gbps  Highest performance in largest EC2 instance sizes  Full bi-section bandwidth in Placement Groups, with no network oversubscription Enhanced Networking  Over 1M PPS performance, reduced instance-to-instance latencies, more consistent network performance EC2 to S3  Traffic to and from S3 can now take advantage of up to 25 Gbps of bandwidth
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network Performance  25 Gigabit & 10 Gigabit  Measured one-way, double that for bi-directional (full duplex)  High, Moderate, Low – instance size and EBS optimization  Not all created equal – Test with iperf if it’s important!  https://aws.amazon.com/premiumsupport/knowledge- center/network-throughput-benchmark-linux-ec2/  Use placement groups when you need high and consistent instance to instance bandwidth  25 Gbps to S3  All traffic limited to 5 Gb/s when exiting EC2 (eg: VPN or DC)
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application 1 23 4 5 Split driver model
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Device Pass Through: Enhanced Networking  SR-IOV allows the physical network device exposed to the instance  Provides faster, more consistent network performance  Higher rate of packets per second  Requires a specialized driver, which means:  Your instance OS needs to know about it  EC2 needs to be told your instance can use it
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hardware Driver Domain Guest Domain Guest Domain VMM NIC Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application 1 2 3 NIC Driver After Enhanced Networking
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.  Next generation of enhanced networking  Hardware checksums  Multi-queue support  Receive side steering  25 Gbps in a placement group  New open source Amazon network driver Elastic Network Adapter
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ENA on r4, i3 and beyond  8xlarge: consistent 10 Gbps 16xlarge: consistent 25 Gbps  Smaller instances  Up to 10 Gbps with baseline  Accrue credits when network usage below baseline  Full bandwidth without placement group, but do need multiple streams  Single stream limited to 10Gbps in placement group  5 Gbps per stream across AZ’s, but still 25Gbps total
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network Performance Instance Family 25 Gbps 10 Gbps Family supports "Up to 10 Gbps" m4 m4.16xlarge m4.10xlarge m5 m5.24xlarge m5.12xlarge Yes c5 c5.18xlarge c5.9xlarge Yes c4 c4.8xlarge r4 r4.16xlarge r4.8xlarge Yes p3 p3.16xlarge p3.8xlarge Yes p2 p2.16xlarge p2.8xlarge g3 g3.16xlarge g3.8xlarge Yes i3.metal i3.metal x1 x1.32xlarge x1.16xlarge x1e x1e.32xlarge x1e.16xlarge Yes f1 f1.16xlarge Yes i3 i3.16xlarge i3.8xlarge Yes
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance Store Temporary block-level storage Physically attached to host computer Lifetime • Data lost when: • drive failure • instance stops • instance terminates • Data persists on reboot Instance store data loss prevention: • Create RAID 1/5/6 • Move data to S3 or EBS • Create a fault tolerant FS XX
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance Store Choose your instance wisely** Supported Instances: i3, d2, f1, h1, x1 (current generation) FS Disk servers find balance between storage and network i3.* family NVMe SSD instance store • I/O intensive workloads • Up to 3.3 million IOPS at a 4 KB block • Up to 16 GB/s sequential throughput Virtual devices on instance are ephemeral[0-23] or nvme[0-7] ** http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance Store on the AWS Console i2.8xlarge (8 x 800 SSD) No additional instance volumes Only additional EBS volumes Size of Instance Store is not optional
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance Store on the EC2 Instance [ec2-user@ip-172-31-16-91 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdb 202:16 0 745.2G 0 disk xvdc 202:32 0 745.2G 0 disk xvdd 202:48 0 745.2G 0 disk xvde 202:64 0 745.2G 0 disk xvdf 202:80 0 745.2G 0 disk xvdg 202:96 0 745.2G 0 disk xvdh 202:112 0 745.2G 0 disk xvdi 202:128 0 745.2G 0 disk Block device mapping for Instance Store $ aws ec2 describe-volumes --filters Name=attachment.instance-id,Values=i-asdf1234 --out text VOLUMES us-west-2b 2016-12-21T17:54:04.171Z False 100 8 snap-15cfb226 in-use vol-XXXXXXXX gp2 ATTACHMENTS 2016-12-21T17:54:04.000Z True /dev/xvda i-asdf1234 attached vol-XXXXXXXX Use lsblk on the instance $ aws ec2 describe-volumes --filters Name=attachment.instance-id,Values=i-asdf1234 --out text Use AWS CLI to show EBS: Shows only the EBS volumes, not instance store volumes:
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EBS – Elastic Block Storage Two Block Storage options for EC2 Instances: EBS and Instance Store EC2 Instance /dev/xvda /dev/xvdb /dev/xvdc Block Device Mapping Instance Store ephemeral0 ephemeral1 vol-xxxxxxxx vol-xxxxxxxx /dev/xvdd EBS Volumes
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EBS Volume Types General Purpose SSD balance price and performance for a wide variety of transactional data gp2 Provisioned IOPS SSD latency-sensitive transactional workloads io1 Throughput Optimized HDD frequently accessed, throughput intensive workloads st1 Cold HDD less frequently accessed data sc1 SSD HDD
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EBS – What volume type should I use? Solid-State Drives (SSD) Hard Disk Drives (HDD) Volume Type General Purpose SSD (gp2*) Provisioned IOPS SSD (io1) Throughput Optimized HDD (st1) Cold HDD (sc1) Use Cases Most workloads, Boot volumes, Low- latency interactive apps, Dev and Test I/O Intensive, Large database, Parallel FS Streaming, Big data, Data warehouses, Log processing, Not boot vol, Parallel FS Lowest cost, infrequently accessed, not boot vol Volume Size 1 GiB - 16 TiB 4 GiB - 16 TiB 500 GiB - 16 TiB 500 GiB - 16 TiB Max. IOPS**/Volume 10,000 32,000 500 250 Max. Throughput/Volume† 160 MiB/s 500 MiB/s*** 500 MiB/s 250 MiB/s Max. IOPS/Instance 80,000 80,000 80,000 80,000 Max. Throughput/Instance 1,750 MiB/s 1,750 MiB/s 1,750 MiB/s 1,750 MiB/s Dominant Performance Attribute IOPS 3 IOPS per GiB IOPS 50 IOPS per GiB MiB/s MiB/s *Default volume type **gp2/io1 based on 16KiB I/O size, st1/sc1 based on 1 MiB I/O size ***An io1 volume created before 12/6/2017 will not achieve this throughput until modified in some way. † To achieve this throughput, you must have an instance that supports it, such as r4.8xlarge or x1.32xlarge.
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EBS Performance http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html - As of July 25, 2017 Instance type EBS- optimized by default Max EBS bandwidth (Mbps)* Expected throughput (MB/s)** Max. IOPS (16 KB I/O size)** Max Network bandwidth 3 Year Reserved $/Hour (N. Virginia) r4.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $1.600 m4.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $1.203 c5.19xlarge Yes 9,000 1,125 64,000 25Gb/s $1.928 g3.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $2.023 i3.16xlarge Yes 14,000 1,750 65,000 25 Gb/s $2.112 f1.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $5.734 p2.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $6.392 x1.32xlarge Yes 10,000 1,250 65,000 10 Gb/s $3.732 Choose the right instance: RAID multiple EBS volumes together to achieve max performance
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Elastic File System (EFS)  Fully managed file system for EC2 instances  POSIX file system semantics  Works with standard operating system APIs  Sharable across thousands of instances  Elastically grows to petabyte scale  Delivers performance for a wide variety of workloads  Highly available and durable  NFS v4–based  Accessible from on-prem servers New!
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EFS Performance File System Size (GiB) Baseline Aggregate Throughput (MiB/s) Burst Aggregate Throughput (MiB/s) Maximum Burst Duration (Min/Day) % of Time File System Can Burst (Per Day) 10 0.5 100 7.2 0.5% 256 12.5 100 180 12.5% 512 25.0 100 360 25.0% 1024 50.0 100 720 50.0% 1536 75.0 150 720 50.0% 2048 100.0 200 720 50.0% 3072 150.0 300 720 50.0% 4096 200.0 400 720 50.0%
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimize HPC storage EFS EBS + EC2 Amazon S3 Amazon Glacier Highly available, multi-AZ, fully managed network- attached elastic file system. For near-line, highly- available storage of files in a traditional NFS format (NFSv4). Create a single-AZ shared file system using EC2 and EBS, with third-party or open source software (ZFS, Weka.io, Intel Lustre, etc). For near-line storage of files optimized for high IOPS. Secure, durable, highly- scalable object storage. Fast access, low cost. For long-term durable storage of data, in a readily accessible get/put access format. Secure, durable, long term, highly cost-effective object storage. For long-term storage and archival of data that is infrequently accessed. Use for read-often, temporary working storage Use for high-IOPS, temporary working storage Primary durable and scalable storage for critical data Use for long-term, lower- cost archival of critical data
  • 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Transfer HPC Data Flow on AWS Storage corporate data center Amazon Glacier Amazon S3 AWS Direct Connect ISV Connectors Storage Gateway AWS Snowball Internet/VPN Ingress Egress Lifecycle EC2 Instance EBS Instance Store Object, Block, File Storage Amazon Kinesis Firehose S3 Transfer Acceleration Amazon CloudFront Other Shared File System EFS 25 Gbps to S3
  • 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Snowball • Accelerate PBs with AWS- provided appliances • 100/80 TB model, global availability Data ingestion into AWS storage services Amazon Kinesis Firehose • Ingest device streams directly into AWS data stores AWS Direct Connect • COLO to AWS ISV Connectors • CommVault • Veritas • etcetera Amazon S3 Transfer Acceleration • Move data up to 300% faster using AWS’s private network AWS Storage Gateway • Instant hybrid cloud • Up to 120 MB/s cloud upload rate (4x improvement), and
  • 62. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Transfer Acceleration S3 Bucket AWS Edge Location Uploader Optimized Throughput! Typically 50%–300% faster Change your endpoint, not your code Global edge locations No firewall exceptions No client software required Internet Only Internet and AWS Edge connections: Constantly monitored, optimized network paths
  • 63. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rio De Janeiro Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los Angeles Seattle Tokyo Singapore Time[hrs] 500 GB upload from these edge locations to a bucket in Singapore Public Internet How fast is S3 transfer acceleration? S3 Transfer Acceleration
  • 64. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speed Comparison - S3 transfer acceleration http://s3speedtest.com  Multipart uploads from your browser to Amazon S3 regions  In general, the farther away from an Amazon S3 region, the more improvement you can expect
  • 65. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Snowball  Fast Data Transfer  Encryption  Rugged and Portable  Tamper Resistant  End-to-End Tracking  Secure Erasure  Programmable
  • 66. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is AWS Snowball? Petabyte-scale data transport E-ink shipping label Ruggedized case “8.5G impact” All data encrypted end-to-end Rain- and dust- resistant Tamper-resistant case and electronics 80 TB 10 GE network
  • 67. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Classes and Tiering on Amazon S3 Standard • Primary data • Big Data Analytics • Small objects • Temporary scratch space • Archive data • Deep/offline archives • Tape vaulting replacement • WORM-compliant data • File sync and share • Active Archive • Enterprise backup • Media transcoding • Geo-redundancy/DR Standard - Infrequent Access Amazon Glacier
  • 68. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS storage migration expansion: AWS Snowmobile
  • 69. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Automation and Batch Processing
  • 70. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traditional Job Schedulers Integrate Easily Bring your scheduler to AWS, or build your own  IBM Platform LSF  Univa Grid Engine  Altair PBS Pro  SLURM  Design your own using AWS services  Do you actually need a scheduler?
  • 71. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch  AWS Batch dynamically provisions resources  Plans, schedules, and executes  No batch software to install Focus on your applications and results!
  • 72. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Automation and Orchestration Choose from several options to adapt your workloads  CfnCluster  AWS Batch  AWS-NICE DCV and EnginFrame  Build your own CloudFormation templates  ISV offerings on Marketplace or use an SI
  • 73. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Launch a Cluster in minutes  Cluster creation usually takes ~15 minutes  Completely managed by CloudFormation $ cfncluster create mycluster
  • 74. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CfnCluster Configuration Options  Operating System  Amazon Linux  Centos 6  Centos 7  Ubuntu 14.04  Scheduler  Sun Grid Engine (SGE)  PBS/Torque  SLURM  Storage Size & IOPS  EBS & Instance Store Encryption  Scaling Speed & Limits  Provisioning Scripts
  • 75. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. License Server Management  FLEXlm works natively on AWS  Each EC2 instance has a unique hostname & hardware address that can’t be spoofed  Set the ENI (Network Interface) for your license server not to “Delete on termination”  Allows for simple license failover and migration
  • 76. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Architecture on AWS corporate data center availability zone autoscaling group parallel FS local NFS s3 data ingress/egress EFS  Three file systems: EFS, Local NFS, and Parallel FS  Snapshot of EBS to s3  Data tiering FS to s3  AutoScaling allows for scaling when needed master instance $ qsub job.sh EBS
  • 77. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. POC using EFS and EBS availability zone S3 EFS 1. Copy Data from S3 to EBS on startup 2. Start job 3. Use EBS vols while job is running 4. Access mounted EFS directories while job is running (/lib and /binary) 5. Record pass/fail 6. Update Data with delta 1 $ ./run_job 2 Mounted File Systems: EBS 4 3 DynamoDB 5 6 r4.16xlarge /scratch /lib and /binary
  • 78. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visualization availability zone corporate data centermodeling cluster  Using a GPU optimized instance and AWS-NICE DCV to visualized results GPU instance
  • 79. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 DynamoDB Amazon SQS CloudWatch Internet Gateway (IGW) region-1a Master Server Auto Scaling Compute Fleet CloudFormation Public Subnet
  • 80. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 DynamoDB Amazon SQS CloudWatch Internet Gateway (IGW) Private Subnet Master Server Auto Scaling Compute Fleet CloudFormation Public Subnet VPC NAT gateway Private Subnet Route Table VPC Traffic -> Local 0.0.0.0 -> Nat Gateway Public Subnet Route Table VPC Traffic -> Local 0.0.0.0 -> Internet Gateway Isolated in a Private Subnet w/ Bastian Bastian Server
  • 81. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 DynamoDB Amazon SQS CloudWatch Internet Gateway (IGW) Private Subnet Master Server Auto Scaling Compute Fleet CloudFormation Public Subnet VPC NAT gateway Corporate Data Center Engineer VPN Connection Private Subnet Route Table VPC Traffic -> Local Corp IP Range -> VPN 0.0.0.0 -> Nat Gateway Public Subnet Route Table VPC Traffic -> Local Corp IP Range -> VPN 0.0.0.0 -> Internet Gateway Isolated with a VPN
  • 82. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Private Subnet Master Server Auto Scaling Compute Fleet Amazon S3 DynamoDB Amazon SQS CloudWatch CloudFormation Corporate Data Center Proxy Server VPN Connection Internet Connection Private Subnet Route Table VPC Traffic -> Local Corp IP Range -> VPN 0.0.0.0 -> VPN Completely Private w/ VPN & Proxy
  • 83. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security Overview on AWS
  • 84. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Shared Responsibility Model
  • 85. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Compliance Programs SOC 1 Global SOC 2 SOC 3 https://aws.amazon.com/compliance/pci-data-privacy-protection-hipaa-soc-fedramp-faqs/ United States Asia Pacific Europe
  • 86. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It is always YOUR data!  Customers choose where to place their data  AWS regions are geographically isolated by design  Data is not replicated to other AWS regions and does not move unless the customer tell us to do so  Customer always own their data, the ability to encrypt it, move it, and delete it AWS Customer Agreement https://aws.amazon.com/agreement/
  • 87. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ubiquitous, Fully-Managed Encryption EBS RDS Amazon Redshift S3 Amazon Glacier Encrypted in transit AWS CloudTrail IAM Fully auditable Restricted access and at rest Fully managed keys in KMS Imported keys Your KMI Amazon EC2
  • 88. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. From this To This Media Destruction
  • 89. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cost Optimization
  • 90. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 Purchasing Options On-Demand Pay for compute capacity by the second with no long- term commitments Spiky workloads, to define needs Reserved Make a 1 or 3 Year commitment and receive a significant discount off On-Demand prices Committed, steady-state usage Spot Spare EC2 capacity at savings of up to 90% off On-Demand prices Fault-tolerant, dev/test, time- flexible, stateless workloads Per Second Billing for EC2 Linux instances & EBS volumes
  • 91. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cost Optimization Weather Forecasting and Modeling On Demand Spot Reserved Instances Forecasting 00z, 06z, 12z, 18z Climate Modeling Weather Events Daily Forecasts Climate Modeling Hurricane
  • 92. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “By using AWS Spot instances, we've been able to save 75% a month simply by changing four lines of code. It makes perfect sense for saving money when you're running continuous integration workloads or pipeline processing.” - Matthew Leventi, Lead Engineer, Lyft Why use Spot – customer examples
  • 93. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. aws.amazon.com/compliance/data-center AWS Data Centers Take a virtual tour of an AWS data center
  • 94. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 95. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you, and how can I help you run HPC workloads on AWS? aws.amazon.com/hpc