SlideShare a Scribd company logo
1 of 21
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
STORAGE FOR HPC IN THE CLOUD
I s a i a h W e i n e r
S r . M g r . S o l u t i o n s A r c h i t e c t u r e
G P S T E C 3 2 4
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
I/O
Cost
TTR
National
Labs
Research,
Energy & UtilitiesGenomics
Analytics,
AI/ML
EDA M&E
Finance
HPC IS COMPLEX
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GROWTH IN CLOUD
2015 2016 2017 2018 2019 2020
70% 65%
61% 59% 58% 55%
10%
15%
26% 26% 26% 28%
10% 12% 13% 15% 16% 17%
CLOUD MARKET FORECAST
On-Prem Public Cloud Private Cloud
Source: IDC Worldwide Quarterly Cloud IT Infrastructure Tracker
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GROWTH IN STORAGE
0
10000
20000
30000
40000
50000
60000
70000
80000
2016 2017 2018 2019 2020
Exabytes
Enterprise HPC
Source: Gartner for Enterprise and IDC for High Performance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DIY – NFS
NFS
Server
Volume Volume
NFS
Server
Volume Volume
NFS
Server
Volume Volume
NFS
Clients
NFS
Clients
NFS
Clients
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON EFS ARCHITECTURE
Clients Clients Clients
Mount
Target
Single Namespace
Mount
Target
Mount
Target
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC CLUSTER NODE ANATOMY
Data
Metadata
Tiering
Backend
Routing
Monitoring
VIPs
Clustering
Storage
Access
Protocols
Frontend
Network
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenge: Local SSD performance
with centralized management. Local
SSDs required data to be copied
around, and multiple copies took up
space.
Solution delivers: Scalable,
sharable, simplified; one copy of
the data, on-par with local SSD for
performance. 0
50
100
150
200
250
Elapsed Time (Lower is Better)
Local SSD WekaIO NFSv4
SEMICONDUCTOR CUSTOMER
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenge: Small file workload with
some large files in the mix. Pre-
solution workaround: more jobs! All
the jobs!
Solution delivers: Scalable,
sharable, simplified; one copy of
the data, on-par with local SSD for
performance. 0
20
40
60
80
100
120
140
WekaIO On-Prem AFA
Elapsed Time (Lower is Better)
1 conversion 6 conversions
GENOMICS CUSTOMER
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cluster Sizing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The hardware is reliable.
Trust the kernel, it is wise.
The MTBF is millions of hours.
Hardware is up until it dies.
…Is the hardware reliable, really?
DPDK + SR-IOV, SPDK, RoCE…
200K hours is more likely.
EC2 Spot could live for 15 minutes!
N O WT H E N
SOFTWARE ASSUMPTIONS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AVAILABILITY VS. DURABILITY
% Downtime Per Year Probability of Loss
99.999 5 minutes 15 seconds 1 in 100,000
99.9999 31 seconds 1 in 1,000,000
99.99999 3 seconds 1 in 10,000,000
99.999999999 1 in 100,000,000,000
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MTBF WITH SINGLE NODE
Failure every 22 years: 1/200,000
1 x MTBF
200K hours
8.5 hour
repair
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MTBF WITH CLUSTER
12 x MTBF
200K hours
2nd failure probability:
(11 x 8.5)/200,000
2nd failure frequency:
1.9 x 2134
3rd failure probability:
(10 x 5.4)/200,000
3rd failure frequency:
4,060 x 3,688
3rd failure every
14,977,000 years:
1 out of 3,688 2nd failures
2nd failure every
4,060 years: 1 out
of 2,134 repairs
Failure every 1.9
years:
12/200,000
5.4 hour
2nd repair
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FINDING METADATA LIMITS
srun -n $i -N 128 mdtest -i 5 -b 3 -z 3 -I 10 -w 1024 -y -d $PFS/testdir
$i = number of
compute processes
$PFS = HPC storage
mountpoint
0
5000
10000
15000
20000
25000
1 2 4 8 16 32 64 128 256 512 1024 2048
Creates/second
Number of client processes
File creates/process/second (32 nodes)
Lustre (single MDS) WekaIO v3.1
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FINDING DATA PATH LIMITS
srun -n $i -N 128 ior -a POSIX -o $PFS/iortest -z -w -F -b 1g -t 1m -i 8
$i = number of
compute processes
$PFS = HPC storage
mountpoint
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 2 4 8 16 32 64 128 256 512 1024 2048
Throughput(MB/sec)
Number of client processes
File-per-process throughput (32 nodes)
Lustre (single MDS) WekaIO v3.1
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FINDING DATA PATH LIMITS (DIRECT I/O)
srun -n $i -N 128 ior -a POSIX -o $PFS/iortest -z -w -F -b 1g -t 1m -i 8
vs.
srun -n $i -N 128 ior -a POSIX -o $PFS/iortest -z -w -F -B -b 1g -t 1m -i 8
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TECHNOLOGY SUMMARY
• Why RPO and RTO matter for HPC in the Cloud
• Lustre
• Supports DNE – Distributed Namespace
• Still no durability after all these years
• EBS performance limitations
• WekaIO
• Distributed Metadata
• Scalable data plane
• Durable, plus S3 persistence
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EXTERNAL RESOURCES
AWS Competency Program
• https://aws.amazon.com/partners/competencies
AWS Quick Start
• https://aws.amazon.com/quickstart
AWS Marketplace
• https://aws.amazon.com/marketplace
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!

More Related Content

What's hot

GPSTEC312-SAP HANA HA on AWS Preventing Production Facepalms
GPSTEC312-SAP HANA HA on AWS Preventing Production FacepalmsGPSTEC312-SAP HANA HA on AWS Preventing Production Facepalms
GPSTEC312-SAP HANA HA on AWS Preventing Production FacepalmsAmazon Web Services
 
NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...
NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...
NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...Amazon Web Services
 
NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...
NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...
NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...Amazon Web Services
 
SRV312_Taking Serverless to the Edge
SRV312_Taking Serverless to the EdgeSRV312_Taking Serverless to the Edge
SRV312_Taking Serverless to the EdgeAmazon Web Services
 
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2Amazon Web Services
 
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017Amazon Web Services
 
Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017
Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017
Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017Amazon Web Services
 
Networking State of the Union - NET205 - re:Invent 2017
Networking State of the Union - NET205 - re:Invent 2017Networking State of the Union - NET205 - re:Invent 2017
Networking State of the Union - NET205 - re:Invent 2017Amazon Web Services
 
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017Amazon Web Services
 
NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...
NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...
NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...Amazon Web Services
 
DEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceDEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceAmazon Web Services
 
MSC204_Leverage AWS Marketplace to accelerate production ready workloads
MSC204_Leverage AWS Marketplace to accelerate production ready workloadsMSC204_Leverage AWS Marketplace to accelerate production ready workloads
MSC204_Leverage AWS Marketplace to accelerate production ready workloadsAmazon Web Services
 
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdfDEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdfAmazon Web Services
 
DEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-RayDEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-RayAmazon Web Services
 
NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...
NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...
NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...Amazon Web Services
 
Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...
Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...
Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...Amazon Web Services
 
CMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment WorkloadsCMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment WorkloadsAmazon Web Services
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfAmazon Web Services
 

What's hot (20)

GPSTEC312-SAP HANA HA on AWS Preventing Production Facepalms
GPSTEC312-SAP HANA HA on AWS Preventing Production FacepalmsGPSTEC312-SAP HANA HA on AWS Preventing Production Facepalms
GPSTEC312-SAP HANA HA on AWS Preventing Production Facepalms
 
NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...
NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...
NEW LAUNCH! Hear how the Pac-12 is using AWS Elemental MediaStore and explore...
 
NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...
NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...
NEW LAUNCH! AWS PrivateLink: Bringing SaaS Solutions into Your VPCs and Your ...
 
SRV312_Taking Serverless to the Edge
SRV312_Taking Serverless to the EdgeSRV312_Taking Serverless to the Edge
SRV312_Taking Serverless to the Edge
 
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
 
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
NEW LAUNCH! AWS Serverless Application Repository - SRV215 - re:Invent 2017
 
Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017
Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017
Oracle Enterprise Solutions on AWS - ENT326 - re:Invent 2017
 
Networking State of the Union - NET205 - re:Invent 2017
Networking State of the Union - NET205 - re:Invent 2017Networking State of the Union - NET205 - re:Invent 2017
Networking State of the Union - NET205 - re:Invent 2017
 
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
 
AWS 容器服務入門實務
AWS 容器服務入門實務AWS 容器服務入門實務
AWS 容器服務入門實務
 
NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...
NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...
NEW LAUNCH! Bring Alexa to Work! Voice-enable Your Organization with Alexa fo...
 
DEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 ServiceDEV206_Life of a Code Change to a Tier 1 Service
DEV206_Life of a Code Change to a Tier 1 Service
 
MSC204_Leverage AWS Marketplace to accelerate production ready workloads
MSC204_Leverage AWS Marketplace to accelerate production ready workloadsMSC204_Leverage AWS Marketplace to accelerate production ready workloads
MSC204_Leverage AWS Marketplace to accelerate production ready workloads
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdfDEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
 
DEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-RayDEV204_Debugging Modern Applications Introduction to AWS X-Ray
DEV204_Debugging Modern Applications Introduction to AWS X-Ray
 
NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...
NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...
NEW LAUNCH! Hear how OwnZones is using AWS Elemental MediaConvert to help med...
 
Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...
Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...
Successfully Migrating Business-Critical Applications to AWS - ENT401 - re:In...
 
CMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment WorkloadsCMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment Workloads
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdf
 

Similar to GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017

透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)Amazon Web Services
 
DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017Amazon Web Services
 
State of the Union: Compute & DevOps
State of the Union: Compute & DevOpsState of the Union: Compute & DevOps
State of the Union: Compute & DevOpsAmazon Web Services
 
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...Amazon Web Services
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSAmazon Web Services
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonAmazon Web Services
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017Amazon Web Services
 
Born in the Cloud, Built like a Startup
Born in the Cloud, Built like a StartupBorn in the Cloud, Built like a Startup
Born in the Cloud, Built like a StartupAmazon Web Services
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
Deep Learning for Industrial IoT - MCL316 - re:Invent 2017
Deep Learning for Industrial IoT - MCL316 - re:Invent 2017Deep Learning for Industrial IoT - MCL316 - re:Invent 2017
Deep Learning for Industrial IoT - MCL316 - re:Invent 2017Amazon Web Services
 
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Amazon Web Services
 
SageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine LearningSageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine LearningAmazon Web Services
 
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San FranciscoAmazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San FranciscoAmazon Web Services
 
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per DayCyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per DayAmazon Web Services
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingAmazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 

Similar to GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017 (20)

透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
 
DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017DynamoDB - What's new - DAT304 - re:Invent 2017
DynamoDB - What's new - DAT304 - re:Invent 2017
 
State of the Union: Compute & DevOps
State of the Union: Compute & DevOpsState of the Union: Compute & DevOps
State of the Union: Compute & DevOps
 
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
What's New for AWS Purpose Built, Non-relational Databases - DAT204 - re:Inve...
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWS
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and Gluon
 
GPSTEC325-Enterprise Storage
GPSTEC325-Enterprise StorageGPSTEC325-Enterprise Storage
GPSTEC325-Enterprise Storage
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
 
Born in the Cloud, Built like a Startup
Born in the Cloud, Built like a StartupBorn in the Cloud, Built like a Startup
Born in the Cloud, Built like a Startup
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Nonrelational Revolution
Nonrelational RevolutionNonrelational Revolution
Nonrelational Revolution
 
Deep Learning for Industrial IoT - MCL316 - re:Invent 2017
Deep Learning for Industrial IoT - MCL316 - re:Invent 2017Deep Learning for Industrial IoT - MCL316 - re:Invent 2017
Deep Learning for Industrial IoT - MCL316 - re:Invent 2017
 
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
 
SageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine LearningSageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine Learning
 
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San FranciscoAmazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
 
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per DayCyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model Training
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT STORAGE FOR HPC IN THE CLOUD I s a i a h W e i n e r S r . M g r . S o l u t i o n s A r c h i t e c t u r e G P S T E C 3 2 4
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. I/O Cost TTR National Labs Research, Energy & UtilitiesGenomics Analytics, AI/ML EDA M&E Finance HPC IS COMPLEX
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GROWTH IN CLOUD 2015 2016 2017 2018 2019 2020 70% 65% 61% 59% 58% 55% 10% 15% 26% 26% 26% 28% 10% 12% 13% 15% 16% 17% CLOUD MARKET FORECAST On-Prem Public Cloud Private Cloud Source: IDC Worldwide Quarterly Cloud IT Infrastructure Tracker
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GROWTH IN STORAGE 0 10000 20000 30000 40000 50000 60000 70000 80000 2016 2017 2018 2019 2020 Exabytes Enterprise HPC Source: Gartner for Enterprise and IDC for High Performance
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DIY – NFS NFS Server Volume Volume NFS Server Volume Volume NFS Server Volume Volume NFS Clients NFS Clients NFS Clients
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON EFS ARCHITECTURE Clients Clients Clients Mount Target Single Namespace Mount Target Mount Target
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC CLUSTER NODE ANATOMY Data Metadata Tiering Backend Routing Monitoring VIPs Clustering Storage Access Protocols Frontend Network
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenge: Local SSD performance with centralized management. Local SSDs required data to be copied around, and multiple copies took up space. Solution delivers: Scalable, sharable, simplified; one copy of the data, on-par with local SSD for performance. 0 50 100 150 200 250 Elapsed Time (Lower is Better) Local SSD WekaIO NFSv4 SEMICONDUCTOR CUSTOMER
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenge: Small file workload with some large files in the mix. Pre- solution workaround: more jobs! All the jobs! Solution delivers: Scalable, sharable, simplified; one copy of the data, on-par with local SSD for performance. 0 20 40 60 80 100 120 140 WekaIO On-Prem AFA Elapsed Time (Lower is Better) 1 conversion 6 conversions GENOMICS CUSTOMER
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cluster Sizing
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The hardware is reliable. Trust the kernel, it is wise. The MTBF is millions of hours. Hardware is up until it dies. …Is the hardware reliable, really? DPDK + SR-IOV, SPDK, RoCE… 200K hours is more likely. EC2 Spot could live for 15 minutes! N O WT H E N SOFTWARE ASSUMPTIONS
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AVAILABILITY VS. DURABILITY % Downtime Per Year Probability of Loss 99.999 5 minutes 15 seconds 1 in 100,000 99.9999 31 seconds 1 in 1,000,000 99.99999 3 seconds 1 in 10,000,000 99.999999999 1 in 100,000,000,000
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MTBF WITH SINGLE NODE Failure every 22 years: 1/200,000 1 x MTBF 200K hours 8.5 hour repair
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MTBF WITH CLUSTER 12 x MTBF 200K hours 2nd failure probability: (11 x 8.5)/200,000 2nd failure frequency: 1.9 x 2134 3rd failure probability: (10 x 5.4)/200,000 3rd failure frequency: 4,060 x 3,688 3rd failure every 14,977,000 years: 1 out of 3,688 2nd failures 2nd failure every 4,060 years: 1 out of 2,134 repairs Failure every 1.9 years: 12/200,000 5.4 hour 2nd repair
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FINDING METADATA LIMITS srun -n $i -N 128 mdtest -i 5 -b 3 -z 3 -I 10 -w 1024 -y -d $PFS/testdir $i = number of compute processes $PFS = HPC storage mountpoint 0 5000 10000 15000 20000 25000 1 2 4 8 16 32 64 128 256 512 1024 2048 Creates/second Number of client processes File creates/process/second (32 nodes) Lustre (single MDS) WekaIO v3.1
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FINDING DATA PATH LIMITS srun -n $i -N 128 ior -a POSIX -o $PFS/iortest -z -w -F -b 1g -t 1m -i 8 $i = number of compute processes $PFS = HPC storage mountpoint 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1 2 4 8 16 32 64 128 256 512 1024 2048 Throughput(MB/sec) Number of client processes File-per-process throughput (32 nodes) Lustre (single MDS) WekaIO v3.1
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FINDING DATA PATH LIMITS (DIRECT I/O) srun -n $i -N 128 ior -a POSIX -o $PFS/iortest -z -w -F -b 1g -t 1m -i 8 vs. srun -n $i -N 128 ior -a POSIX -o $PFS/iortest -z -w -F -B -b 1g -t 1m -i 8
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TECHNOLOGY SUMMARY • Why RPO and RTO matter for HPC in the Cloud • Lustre • Supports DNE – Distributed Namespace • Still no durability after all these years • EBS performance limitations • WekaIO • Distributed Metadata • Scalable data plane • Durable, plus S3 persistence
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EXTERNAL RESOURCES AWS Competency Program • https://aws.amazon.com/partners/competencies AWS Quick Start • https://aws.amazon.com/quickstart AWS Marketplace • https://aws.amazon.com/marketplace
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. THANK YOU!