SlideShare a Scribd company logo
1 of 56
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nathan McGuirt, Manager, Solutions Architecture, AWS
Gabriele Garzoglio, HEP Cloud Facility Project Manager, Fermilab
December 2016
Building HPC Clusters as Code
in the (Almost) Infinite Cloud
CMP318
What to Expect from the Session
• Why customers are using AWS for HPC/HTC
• Leveraging Spot Instances for big compute at low cost
• Accelerating deployment with automation and managed
services
Agenda
• Why AWS for HPC?
• Automating cluster deployment
• Fermi National Accelerator Laboratory
• Demo of scaling jobs on a budget
High Performance Computing (HPC) vs.
High Throughput Computing (HTC)
HPC: High performance computing
(cluster computing)
- Tightly clustered
- Latency sensitive
HTC: High throughput computing
(grid computing)
- Less inter-node communication
- More horizontal scalability (pleasingly
parallel)
Why AWS for HPC?
Time to research
%
Time to research
Innovation and performance
Scalability and flexibility
Data
AWS Snowball AWS Direct Connect
Cost
Cost – Spot market
Request
1
2
3
4
5
6
7
8
9
Bid Price
$1.00
$0.55
$0.50
$0.33
$0.20
$0.18
$0.15
$0.10
$0.05
Spot Price
$0.20
$0.20
$0.20
$0.20
$0.20
Spot Bid Advisor
Spot Fleet
Spot Fleet
Clusters as code
Automation
• Fully custom
• APIs
• AWS CloudFormation
• Managed services
• Amazon EMR
• AWS Batch
• Software cluster management solutions
• CFNCluster
• Alces Flight
• Partner offerings
API - SDKs
Java Python PHP .NET Ruby nodeJS
iOS Android AWS Toolkit
for Visual
Studio
AWS Toolkit
for Eclipse
Tools for
Windows
PowerShell
CLI
CloudFormation
CloudFormation
Resources:
Ec2Instance:
Type: AWS::EC2::Instance
Properties:
SecurityGroups:
- Ref: InstanceSecurityGroup
KeyName: mykey
ImageId: ''
InstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Enable SSH access via port 22
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '22'
ToPort: '22'
CidrIp: 0.0.0.0/0
EMR
AWS Batch
AWS CFNCluster
$ pip install cfncluster
...
$ cfncluster configure
...
$ cfncluster run mycluster
Alces Flight
Alces Flight is a software offering self-service
supercomputers via the AWS Marketplace.
Creates self-scaling clusters with more than
750 popular scientific applications pre-installed,
complete with libraries and various compiler
optimizations, ready to run. The clusters use
the AWS Spot Instances by default.
AWS Partners in the HPC Space
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gabriele Garzoglio, HEP Cloud Facility Co-Project Manager, Fermilab
December 2016
The HEP Cloud Facility
Elastic Computing for High Energy Physics
Computing at the Fermi National Accelerator Laboratory
Lead United States particle physics laboratory
• Funded by the Department of Energy
• ~100 PB of data on tape
• High Throughput Computing characterized by:
• “Pleasingly parallel” tasks
• High CPU instruction / Bytes IO ratio
• But still lots of I/O. See Pfister: “In Search of
Clusters”
Focus on Neutrino Physics
• Including the NOvA Experiment
Strong collaborations with international
laboratories
• CERN / Large Hardron Collider (LHC)
Experiments
• Brookhaven National Laboratory (BNL)
• Lead institution (“Tier-1”) for the Compact Muon
Solenoid (CMS)
Drivers of Facility Evolution: Capacity / Cost / Elasticity
Price of one core-year on
Commercial CloudsHEP needs: 10-100 x today capacity
Facility size: 15k cores
NOvA experiment jobs in queue at FNAL
Usage is not steady-state
CMS Analysis Users – Yearly Cycle
Vision for Facility Evolution
• Strategic Plan for U.S. Particle Physics (P5 Report to the U.S. funding agencies)
Fermilab Facility
HTC, HPC Cores
68.7K
Disk Systems
37.6 PB
Tape
101 PB
10/100 Gbit
Networking
~5k internal
network ports
The Facility Today is “Fixed”
Rapidly evolving computer architectures
and increasing data volumes require
effective crosscutting solutions that are
being developed in other science
disciplines and in industry.
• HEP Cloud Vision Statement
– HEPCloud is envisioned as a portal to an ecosystem of diverse computing resources commercial or
academic
– Provides “complete solutions” to users, with agreed upon levels of service
– The Facility routes to local or remote resources based on workflow requirements, cost, and efficiency of
accessing various resources
– Manages allocations of users to target compute engines
• Pilot project to explore feasibility, capability of HEPCloud
– Goal of moving into production during FY18
– Seed money provided by industry
HEP Cloud Architecture
Overview External Relationships
HEP Cloud Architecture
Overview External Relationships
Basic idea: Add
disparate resources
(Cloud VM, HPC slots,
Grid nodes, local
resources) into a
central resource pool.
Fermilab HEPCloud: Expanding to the Cloud
Reference herein to any specific commercial product, process, or service by trade name, trademark,
manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or
favoring by the United States Government or any agency thereof.
– Provisioning
– Performance
– Image portability
– On-demand services
• Where to start?
– Market leader:
Amazon Web Services (AWS)
• Integration challenges that needs to
be managed to run at scale:
– Networking
– Storage and data movement
– Monitoring and accounting
– Security
Integration Challenges: Provisioning – Create an Overlay Batch System with
GlideinWMS and HTCondor
condor
submit
VO Frontend
HTCondor
Central Manager
HTCondor
Schedulers
HTCondor
Schedulers
Frontend
Grid Site
Virtual Machine
Job
Local Resources
Virtual Machine
Job
GlideinWMS Factory
HTCondor-G
High Performance
Computers
Virtual Machine
Job
Cloud Provider
Virtual Machine
VM
Glidein
HTCondor
Startd
Job
Pull Job
Integration Challenges: Provisioning – Containing costs
• Using AWS Spot market to
contain costs
• Workflows are already engineered
to sustain preemption from the
Grid
– Job are “short”, i.e., killed jobs are
affordable w/o checkpointing
– Preempted jobs are automatically
resubmitted
– Data management systems
identify files in a dataset that were
not processed and allow recovery
CMS use case:
Histogram of number
of times each job
started
(measure of
preemption)
NOvA use case:
number of VMs
running (blue) and
preempted (red)
every hour
2.5M jobs
with no
preemption
240 VM / h
60 VM preempted
in 1h
400K jobs
with one
preemption
Integration Challenges: Provisioning – Containing costs
• The Decision Engine oversees
the costs and optimizing VM
placement using the status of the
facility, the historical prices, and
the job characteristics
Bid at 25% x on-demand price has lowest expected cost
• Based on pre-emption history,
calculating the probability that a 5-
24 h job finishes within a week
although it has to restart due to
preemption, for various bidding
algorithms.
$0.25 / h
Integration Challenges: Performance
Benchmarks used to compare workflow duration on AWS (and $$) with local execution
Need EBS
Need EBS
32 cores
scale w/ cores Need EBS
Need EBS
32 cores
scale w/ cores
Need
parallel
streams
c3.2xlarge c3.2xlarge
good candidate – want > 1
From AWS to
FNAL: 7Gbps
Access to S3 always
saturates the 1 Gbps
interface
Integration Challenges: Performance
CMS Use Case:
Wallclock distribution by AWS instance type
Integration Challenges: Image Portability
Build VM management tool,
considering:
• HVM virtualization (HW VM
+ Xen) on AWS: gives
access to all AWS
resources
• Contain VM size (saves
import time and cost)
• Import process covers
multiple AWS accounts and
regions
• AuthN with AWS use short-
lived role-based tokens,
rather than long term keys
Build “Golden Image” from standard Fermilab Worker Node configuration VM.
Integration Challenges: On-demand Services
Jobs depend on software services to run
Automating the deployment of these services on AWS on-demand - enables scalability and cost savings
• Services include data caching (e.g., Squid) WMS , submission service, data transfer, etc.
• As services are made deployable on-demand, instantiate ensemble of services together (e.g.,
through AWS CloudFormation)
Example: on-demand Squid
• Deploy Squid via
auto-scaling services.
Squid is deployed if average
group bandwidth utilization
is too high. Server is
deployed or destroyed in
30 seconds.
• Front Squids with a
load balancer.
• Name the load balancer for that
region via Route 53
Auto Scaling
group
CloudFormation
"SquidInstanceType" : { "Type" : "String", "Default" : "c3.xlarge", … },
"SquidLaunchConfiguration" : { "Type" : "AWS::AutoScaling::LaunchConfiguration",
"Properties" : {
"InstanceType" : { "Ref" : "SquidInstanceType" },
"ImageId" : { "Fn::FindInMap" : [ "AMIRegionMap", {"Ref":"AWS::Region"}, "SquidAMI" ]},
"SecurityGroups" : [ { "Fn::FindInMap" :
["SecurityGroupRegionMap",{"Ref":"AWS::Region"}, "SquidSG" ] } ],
… } }
"SquidAutoscalingGroup" : { "Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties" : {
"AvailabilityZones" : {"Ref" : "AvailabilityZones"},
"LaunchConfigurationName" : {"Ref" : "SquidLaunchConfiguration" },
"LoadBalancerNames" : [ {"Ref" : "SquidLoadBalancer" } ],
… } },
"SquidAutoscaleUpPolicy" : { "Type" : "AWS::AutoScaling::ScalingPolicy",
"Properties" : {
"AdjustmentType" : "ChangeInCapacity",
"AutoScalingGroupName" : { "Ref" : "SquidAutoscalingGroup" },
"ScalingAdjustment" : "1”
… } },
…
Integration Challenges: On-demand Services – CloudFormation
"SquidNetworkBandwidthHighAlarm" : { "Type" : "AWS::CloudWatch::Alarm",
"Properties" : {
"AlarmDescription" : "Scale up if average NetworkIn > for 5 minutes",
"MetricName" : "NetworkOut",
"Statistic" : "Average",
"Period" : "300",
"Threshold" : "1100000000",
"AlarmActions" : [ { "Ref" : "SquidAutoscaleUpPolicy" } ],
"ComparisonOperator" : "GreaterThanThreshold”,
… } }
…
"SecurityGroupRegionMap" : {
"us-west-2“ : { "SquidSG" : "sg-xxxxf6cb" },
"us-east-1" : { "SquidSG" : "sg-xxxx70ca" },
… }
"SquidLoadBalancer" : {"Type" : "AWS::ElasticLoadBalancing::LoadBalancer",
"Properties" : {
"CrossZone" : "false",
"SecurityGroups" : [ {"Fn::FindInMap" :
[ "SecurityGroupRegionMap", { "Ref" : "AWS::Region" } , "SquidSG" ] } ],
"Listeners" : [ { "LoadBalancerPort":"3128", "InstancePort":"3128", "Protocol":"TCP" } ],
"HealthCheck" : { "Target" : "TCP:3128", "HealthyThreshold" : "3", … }
… } }
Integration Challenges: On-demand Services – CloudFormation
"elbHostedZone": { "Type" : "AWS::Route53::HostedZone",
"Properties" : {
"HostedZoneConfig" : {
"Comment" : "auto-generated private hosting zone for ELB” },
"Name" : { "Fn::Join" : ["", [{"Ref":"AvailabilityZone"},".elb.fnaldata.org.”]]},
"VPCs" : [{
"VPCId" : { … },
"VPCRegion" : { "Ref" : "AWS::Region"} }]
} }
"elbDNS" : { "Type" : "AWS::Route53::RecordSet",
"Properties" : {
"HostedZoneId" : { "Ref" : "elbHostedZone" },
"Name" : { "Fn::Join" :
["", ["elb2.",{"Ref":"AvailabilityZone"},".elb.fnaldata.org."]]},
"ResourceRecords" : [ { "Fn::GetAtt" : [ "SquidLoadBalancer", "DNSName" ] } ]
… } }
Clients call Squid as elb2.<AvailabilityZone>.elb.fnaldata.org
Integration Challenges: On-demand Services – CloudFormation
Integration Challenges: Networking
Implement routing / firewall configuration
to use peered ESNet / AWS to route
data flow through ESNet
AWS / ESNet data egress cost waiver
• For data transferred through
ESNet, transfer charges are
waived for data costs up to 15%
of the total
Integration Challenges: Storage and Data Movement
Integrate S3 storage stage-in/-out for AWS internal /
external access - enables flexibility on data
management
• Consider O(1000) jobs finishing on the cloud and
transferring output to remote storage
• Storage bandwidth capacity is limited
• Two main strategies for data transfers:
1. Fill the available network transfer by having some
jobs wait - Put the jobs on a queue and transfer
data from as many jobs as possible - idle VMs
have a cost
2. Store data on S3 almost concurrently (due to high
scalability) and transfer data back asynchronously
- data on S3 has a cost
• The cheapest strategy depends on the storage
bandwidth, number of jobs, etc.
S3
Integration Challenges: Monitoring and Accounting
Monitor # GCloud VMs (S. Korea Priv. Cloud) Monitor # AWS VMs
Accounting:
$ by VO and VM Type
Monitor
HEP Cloud
Slots
NoVA Data Processing
Processing the 2014/2015 dataset
3 use cases: Particle ID, Montecarlo ,
Data Reconstruction
Received AWS research grant
Dark Energy Survey
Gravitational Waves
Search for optical
counterpart of events
detected by LIGO/VIRGO
gravitational wave detectors (FNAL LDRD)
Modest CPU needs, but want 5-10 hour turnaround
Burst activity driven entirely by physical phenomena
(gravitational wave events are transient)
Rapid provisioning to peak
CMS Monte Carlo Simulation
Generation (and detector simulation, digitization,
reconstruction) of simulated events in time for
Moriond conference.
58,000 compute cores, steady-state
Demonstrates scalability
Received AWS research grant
Initial HEPCloud Use Cases
Results from the CMS Use Case
• All CMS simulation requests fulfilled by the conference
deadline (Rencontres de Moriond 2016 )
– 2.9 million jobs, 15.1 million wall hours
• 9.5% badput – includes preemption from spot pricing
• 87% CPU efficiency
– 518 million events generated
CMS Reaching ~60k slots on AWS with HEPCloud
10% Test 25%
60000 slots
10000 VM
Each color corresponds to a
different region / zone /machine type
HEPCloud AWS: 25% of CMS global capacity
Production
Analysis
Reprocessing
Production on AWS
via FNAL HEPCloud
Production
Analysis
Reprocessing
Production on AWS
via FNAL HEPCloud
On-premises vs. cloud cost comparison
Average cost per core-hour
• On-premises resource: 0.9 cents per
core-hour
• Includes power, cooling, staff,
but assumes 100% utilization
• Off-premises at AWS (CMS use case):
1.4 cents per core-hour
• Off-premises at AWS (NOvA use case):
3.0 cents per core-hour
• Use case demanded bigger VM
Benchmarks
• Specialized (“ttbar”) benchmark focused on HEP workflows
• On-premises: 0.0163 ttbar /s (higher = better)
• Off-premises: 0.0158 ttbar /s
Raw compute performance roughly equivalent
Cloud costs approaching equivalence
Amazon provisions/retires 60k cores for our system in ~1 hour
Acknowledgements
The support from the Computing Sector
The Fermilab HEPCloud Facility team
AWS and their engagement team, in particular Jamie Baker
The HTCondor team
The collaboration and contributions from KISTI, in particular Dr. Seo-Young Noh
The Illinois Institute of Technology (IIT) students and professors Ioan Raicu and
Shangping Ren
The Italian National Institute of Nuclear Physics (INFN) summer student program
• NOvA: http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5774
• CMS: http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5750
For More Information:
demonstration
Thank you!
Remember to complete
your evaluations!
Related Sessions
CMP201 - Auto Scaling – The Fleet Management Solution for Planet Earth

More Related Content

What's hot

AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)Amazon Web Services
 
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...Amazon Web Services
 
AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...
AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...
AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...Amazon Web Services
 
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...Amazon Web Services
 
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...Amazon Web Services
 
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...Amazon Web Services
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Amazon Web Services
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageAmazon Web Services
 
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...Amazon Web Services
 
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...Amazon Web Services
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services
 
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)Amazon Web Services
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
AWS Services for Content Production
AWS Services for Content ProductionAWS Services for Content Production
AWS Services for Content ProductionAmazon Web Services
 
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...Amazon Web Services
 
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...Amazon Web Services
 
Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Amazon Web Services
 
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...Amazon Web Services
 
How to Migrate your Startup to AWS
How to Migrate your Startup to AWSHow to Migrate your Startup to AWS
How to Migrate your Startup to AWSAmazon Web Services
 
Ceate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureCeate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureAmazon Web Services
 

What's hot (20)

AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
AWS re:Invent 2016: Workshop: Migrating Microsoft Applications to AWS (ENT216)
 
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...
 
AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...
AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...
AWS re:Invent 2016: Global Traffic Management with Amazon Route 53 Traffic Fl...
 
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
AWS re:Invent 2016: 5 Security Automation Improvements You Can Make by Using ...
 
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
 
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
AWS re:Invent 2016: Best practices for running enterprise workloads on AWS (E...
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
 
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
 
AWS Services for Content Production
AWS Services for Content ProductionAWS Services for Content Production
AWS Services for Content Production
 
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
 
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
 
Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017
 
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
 
How to Migrate your Startup to AWS
How to Migrate your Startup to AWSHow to Migrate your Startup to AWS
How to Migrate your Startup to AWS
 
Ceate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureCeate a Scalable Cloud Architecture
Ceate a Scalable Cloud Architecture
 

Viewers also liked

Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)Amazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...Amazon Web Services
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...In-Memory Computing Summit
 
"Big Data" Bioinformatics
"Big Data" Bioinformatics"Big Data" Bioinformatics
"Big Data" BioinformaticsBrian Repko
 
Building an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 MinutesBuilding an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 MinutesMonica Rut Avellino
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Amazon Web Services
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPCAmazon Web Services
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAmazon Web Services
 
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...Amazon Web Services
 
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...Amazon Web Services
 
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)Amazon Web Services
 
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...Amazon Web Services
 

Viewers also liked (20)

Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
HPC in AWS - Technical Workshop
HPC in AWS - Technical WorkshopHPC in AWS - Technical Workshop
HPC in AWS - Technical Workshop
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
 
Fermilab aws on demand
Fermilab aws on demandFermilab aws on demand
Fermilab aws on demand
 
HPC on AWS
HPC on AWSHPC on AWS
HPC on AWS
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
"Big Data" Bioinformatics
"Big Data" Bioinformatics"Big Data" Bioinformatics
"Big Data" Bioinformatics
 
Building an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 MinutesBuilding an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 Minutes
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
 
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
 
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
 
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
 

Similar to AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cloud (CMP318)

High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWSAmazon Web Services
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Amazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 
Cloud Economics: The Financial Case for Cloud Migration
Cloud Economics: The Financial Case for Cloud MigrationCloud Economics: The Financial Case for Cloud Migration
Cloud Economics: The Financial Case for Cloud MigrationAmazon Web Services
 
Cloud Overview
Cloud OverviewCloud Overview
Cloud Overviewiasaglobal
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingiasaglobal
 
What would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWSWhat would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWSAmazon Web Services
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch DeckNicholas Vossburg
 
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula Project
 
Suitability of Commercial Clouds for NASA's HPC Applications
Suitability of Commercial Clouds for NASA's HPC ApplicationsSuitability of Commercial Clouds for NASA's HPC Applications
Suitability of Commercial Clouds for NASA's HPC Applicationsinside-BigData.com
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...IEEEGLOBALSOFTSTUDENTPROJECTS
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...IEEEFINALSEMSTUDENTPROJECTS
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Navops talk at hpc in the cloud meetup 19 march 2019
Navops talk at hpc in the cloud meetup 19 march 2019Navops talk at hpc in the cloud meetup 19 march 2019
Navops talk at hpc in the cloud meetup 19 march 2019Abhishek Gupta
 

Similar to AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cloud (CMP318) (20)

High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
Kinney j aws
Kinney j awsKinney j aws
Kinney j aws
 
Cloud Economics: The Financial Case for Cloud Migration
Cloud Economics: The Financial Case for Cloud MigrationCloud Economics: The Financial Case for Cloud Migration
Cloud Economics: The Financial Case for Cloud Migration
 
Cloud Overview
Cloud OverviewCloud Overview
Cloud Overview
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Could the “C” in HPC stand for Cloud?
Could the “C” in HPC stand for Cloud?Could the “C” in HPC stand for Cloud?
Could the “C” in HPC stand for Cloud?
 
What would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWSWhat would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWS
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
 
Suitability of Commercial Clouds for NASA's HPC Applications
Suitability of Commercial Clouds for NASA's HPC ApplicationsSuitability of Commercial Clouds for NASA's HPC Applications
Suitability of Commercial Clouds for NASA's HPC Applications
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Navops talk at hpc in the cloud meetup 19 march 2019
Navops talk at hpc in the cloud meetup 19 march 2019Navops talk at hpc in the cloud meetup 19 march 2019
Navops talk at hpc in the cloud meetup 19 march 2019
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cloud (CMP318)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nathan McGuirt, Manager, Solutions Architecture, AWS Gabriele Garzoglio, HEP Cloud Facility Project Manager, Fermilab December 2016 Building HPC Clusters as Code in the (Almost) Infinite Cloud CMP318
  • 2. What to Expect from the Session • Why customers are using AWS for HPC/HTC • Leveraging Spot Instances for big compute at low cost • Accelerating deployment with automation and managed services
  • 3. Agenda • Why AWS for HPC? • Automating cluster deployment • Fermi National Accelerator Laboratory • Demo of scaling jobs on a budget
  • 4. High Performance Computing (HPC) vs. High Throughput Computing (HTC) HPC: High performance computing (cluster computing) - Tightly clustered - Latency sensitive HTC: High throughput computing (grid computing) - Less inter-node communication - More horizontal scalability (pleasingly parallel)
  • 5. Why AWS for HPC?
  • 10. Data AWS Snowball AWS Direct Connect
  • 11. Cost
  • 12. Cost – Spot market Request 1 2 3 4 5 6 7 8 9 Bid Price $1.00 $0.55 $0.50 $0.33 $0.20 $0.18 $0.15 $0.10 $0.05 Spot Price $0.20 $0.20 $0.20 $0.20 $0.20
  • 17. Automation • Fully custom • APIs • AWS CloudFormation • Managed services • Amazon EMR • AWS Batch • Software cluster management solutions • CFNCluster • Alces Flight • Partner offerings
  • 18. API - SDKs Java Python PHP .NET Ruby nodeJS iOS Android AWS Toolkit for Visual Studio AWS Toolkit for Eclipse Tools for Windows PowerShell CLI
  • 20. CloudFormation Resources: Ec2Instance: Type: AWS::EC2::Instance Properties: SecurityGroups: - Ref: InstanceSecurityGroup KeyName: mykey ImageId: '' InstanceSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Enable SSH access via port 22 SecurityGroupIngress: - IpProtocol: tcp FromPort: '22' ToPort: '22' CidrIp: 0.0.0.0/0
  • 21. EMR
  • 23. AWS CFNCluster $ pip install cfncluster ... $ cfncluster configure ... $ cfncluster run mycluster
  • 24. Alces Flight Alces Flight is a software offering self-service supercomputers via the AWS Marketplace. Creates self-scaling clusters with more than 750 popular scientific applications pre-installed, complete with libraries and various compiler optimizations, ready to run. The clusters use the AWS Spot Instances by default.
  • 25. AWS Partners in the HPC Space
  • 26. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gabriele Garzoglio, HEP Cloud Facility Co-Project Manager, Fermilab December 2016 The HEP Cloud Facility Elastic Computing for High Energy Physics
  • 27. Computing at the Fermi National Accelerator Laboratory Lead United States particle physics laboratory • Funded by the Department of Energy • ~100 PB of data on tape • High Throughput Computing characterized by: • “Pleasingly parallel” tasks • High CPU instruction / Bytes IO ratio • But still lots of I/O. See Pfister: “In Search of Clusters” Focus on Neutrino Physics • Including the NOvA Experiment Strong collaborations with international laboratories • CERN / Large Hardron Collider (LHC) Experiments • Brookhaven National Laboratory (BNL) • Lead institution (“Tier-1”) for the Compact Muon Solenoid (CMS)
  • 28. Drivers of Facility Evolution: Capacity / Cost / Elasticity Price of one core-year on Commercial CloudsHEP needs: 10-100 x today capacity Facility size: 15k cores NOvA experiment jobs in queue at FNAL Usage is not steady-state CMS Analysis Users – Yearly Cycle
  • 29. Vision for Facility Evolution • Strategic Plan for U.S. Particle Physics (P5 Report to the U.S. funding agencies) Fermilab Facility HTC, HPC Cores 68.7K Disk Systems 37.6 PB Tape 101 PB 10/100 Gbit Networking ~5k internal network ports The Facility Today is “Fixed” Rapidly evolving computer architectures and increasing data volumes require effective crosscutting solutions that are being developed in other science disciplines and in industry. • HEP Cloud Vision Statement – HEPCloud is envisioned as a portal to an ecosystem of diverse computing resources commercial or academic – Provides “complete solutions” to users, with agreed upon levels of service – The Facility routes to local or remote resources based on workflow requirements, cost, and efficiency of accessing various resources – Manages allocations of users to target compute engines • Pilot project to explore feasibility, capability of HEPCloud – Goal of moving into production during FY18 – Seed money provided by industry
  • 30. HEP Cloud Architecture Overview External Relationships
  • 31. HEP Cloud Architecture Overview External Relationships Basic idea: Add disparate resources (Cloud VM, HPC slots, Grid nodes, local resources) into a central resource pool.
  • 32. Fermilab HEPCloud: Expanding to the Cloud Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. – Provisioning – Performance – Image portability – On-demand services • Where to start? – Market leader: Amazon Web Services (AWS) • Integration challenges that needs to be managed to run at scale: – Networking – Storage and data movement – Monitoring and accounting – Security
  • 33. Integration Challenges: Provisioning – Create an Overlay Batch System with GlideinWMS and HTCondor condor submit VO Frontend HTCondor Central Manager HTCondor Schedulers HTCondor Schedulers Frontend Grid Site Virtual Machine Job Local Resources Virtual Machine Job GlideinWMS Factory HTCondor-G High Performance Computers Virtual Machine Job Cloud Provider Virtual Machine VM Glidein HTCondor Startd Job Pull Job
  • 34. Integration Challenges: Provisioning – Containing costs • Using AWS Spot market to contain costs • Workflows are already engineered to sustain preemption from the Grid – Job are “short”, i.e., killed jobs are affordable w/o checkpointing – Preempted jobs are automatically resubmitted – Data management systems identify files in a dataset that were not processed and allow recovery CMS use case: Histogram of number of times each job started (measure of preemption) NOvA use case: number of VMs running (blue) and preempted (red) every hour 2.5M jobs with no preemption 240 VM / h 60 VM preempted in 1h 400K jobs with one preemption
  • 35. Integration Challenges: Provisioning – Containing costs • The Decision Engine oversees the costs and optimizing VM placement using the status of the facility, the historical prices, and the job characteristics Bid at 25% x on-demand price has lowest expected cost • Based on pre-emption history, calculating the probability that a 5- 24 h job finishes within a week although it has to restart due to preemption, for various bidding algorithms. $0.25 / h
  • 36. Integration Challenges: Performance Benchmarks used to compare workflow duration on AWS (and $$) with local execution Need EBS Need EBS 32 cores scale w/ cores Need EBS Need EBS 32 cores scale w/ cores Need parallel streams c3.2xlarge c3.2xlarge good candidate – want > 1 From AWS to FNAL: 7Gbps Access to S3 always saturates the 1 Gbps interface
  • 37. Integration Challenges: Performance CMS Use Case: Wallclock distribution by AWS instance type
  • 38. Integration Challenges: Image Portability Build VM management tool, considering: • HVM virtualization (HW VM + Xen) on AWS: gives access to all AWS resources • Contain VM size (saves import time and cost) • Import process covers multiple AWS accounts and regions • AuthN with AWS use short- lived role-based tokens, rather than long term keys Build “Golden Image” from standard Fermilab Worker Node configuration VM.
  • 39. Integration Challenges: On-demand Services Jobs depend on software services to run Automating the deployment of these services on AWS on-demand - enables scalability and cost savings • Services include data caching (e.g., Squid) WMS , submission service, data transfer, etc. • As services are made deployable on-demand, instantiate ensemble of services together (e.g., through AWS CloudFormation) Example: on-demand Squid • Deploy Squid via auto-scaling services. Squid is deployed if average group bandwidth utilization is too high. Server is deployed or destroyed in 30 seconds. • Front Squids with a load balancer. • Name the load balancer for that region via Route 53 Auto Scaling group CloudFormation
  • 40. "SquidInstanceType" : { "Type" : "String", "Default" : "c3.xlarge", … }, "SquidLaunchConfiguration" : { "Type" : "AWS::AutoScaling::LaunchConfiguration", "Properties" : { "InstanceType" : { "Ref" : "SquidInstanceType" }, "ImageId" : { "Fn::FindInMap" : [ "AMIRegionMap", {"Ref":"AWS::Region"}, "SquidAMI" ]}, "SecurityGroups" : [ { "Fn::FindInMap" : ["SecurityGroupRegionMap",{"Ref":"AWS::Region"}, "SquidSG" ] } ], … } } "SquidAutoscalingGroup" : { "Type" : "AWS::AutoScaling::AutoScalingGroup", "Properties" : { "AvailabilityZones" : {"Ref" : "AvailabilityZones"}, "LaunchConfigurationName" : {"Ref" : "SquidLaunchConfiguration" }, "LoadBalancerNames" : [ {"Ref" : "SquidLoadBalancer" } ], … } }, "SquidAutoscaleUpPolicy" : { "Type" : "AWS::AutoScaling::ScalingPolicy", "Properties" : { "AdjustmentType" : "ChangeInCapacity", "AutoScalingGroupName" : { "Ref" : "SquidAutoscalingGroup" }, "ScalingAdjustment" : "1” … } }, … Integration Challenges: On-demand Services – CloudFormation
  • 41. "SquidNetworkBandwidthHighAlarm" : { "Type" : "AWS::CloudWatch::Alarm", "Properties" : { "AlarmDescription" : "Scale up if average NetworkIn > for 5 minutes", "MetricName" : "NetworkOut", "Statistic" : "Average", "Period" : "300", "Threshold" : "1100000000", "AlarmActions" : [ { "Ref" : "SquidAutoscaleUpPolicy" } ], "ComparisonOperator" : "GreaterThanThreshold”, … } } … "SecurityGroupRegionMap" : { "us-west-2“ : { "SquidSG" : "sg-xxxxf6cb" }, "us-east-1" : { "SquidSG" : "sg-xxxx70ca" }, … } "SquidLoadBalancer" : {"Type" : "AWS::ElasticLoadBalancing::LoadBalancer", "Properties" : { "CrossZone" : "false", "SecurityGroups" : [ {"Fn::FindInMap" : [ "SecurityGroupRegionMap", { "Ref" : "AWS::Region" } , "SquidSG" ] } ], "Listeners" : [ { "LoadBalancerPort":"3128", "InstancePort":"3128", "Protocol":"TCP" } ], "HealthCheck" : { "Target" : "TCP:3128", "HealthyThreshold" : "3", … } … } } Integration Challenges: On-demand Services – CloudFormation
  • 42. "elbHostedZone": { "Type" : "AWS::Route53::HostedZone", "Properties" : { "HostedZoneConfig" : { "Comment" : "auto-generated private hosting zone for ELB” }, "Name" : { "Fn::Join" : ["", [{"Ref":"AvailabilityZone"},".elb.fnaldata.org.”]]}, "VPCs" : [{ "VPCId" : { … }, "VPCRegion" : { "Ref" : "AWS::Region"} }] } } "elbDNS" : { "Type" : "AWS::Route53::RecordSet", "Properties" : { "HostedZoneId" : { "Ref" : "elbHostedZone" }, "Name" : { "Fn::Join" : ["", ["elb2.",{"Ref":"AvailabilityZone"},".elb.fnaldata.org."]]}, "ResourceRecords" : [ { "Fn::GetAtt" : [ "SquidLoadBalancer", "DNSName" ] } ] … } } Clients call Squid as elb2.<AvailabilityZone>.elb.fnaldata.org Integration Challenges: On-demand Services – CloudFormation
  • 43. Integration Challenges: Networking Implement routing / firewall configuration to use peered ESNet / AWS to route data flow through ESNet AWS / ESNet data egress cost waiver • For data transferred through ESNet, transfer charges are waived for data costs up to 15% of the total
  • 44. Integration Challenges: Storage and Data Movement Integrate S3 storage stage-in/-out for AWS internal / external access - enables flexibility on data management • Consider O(1000) jobs finishing on the cloud and transferring output to remote storage • Storage bandwidth capacity is limited • Two main strategies for data transfers: 1. Fill the available network transfer by having some jobs wait - Put the jobs on a queue and transfer data from as many jobs as possible - idle VMs have a cost 2. Store data on S3 almost concurrently (due to high scalability) and transfer data back asynchronously - data on S3 has a cost • The cheapest strategy depends on the storage bandwidth, number of jobs, etc. S3
  • 45. Integration Challenges: Monitoring and Accounting Monitor # GCloud VMs (S. Korea Priv. Cloud) Monitor # AWS VMs Accounting: $ by VO and VM Type Monitor HEP Cloud Slots
  • 46. NoVA Data Processing Processing the 2014/2015 dataset 3 use cases: Particle ID, Montecarlo , Data Reconstruction Received AWS research grant Dark Energy Survey Gravitational Waves Search for optical counterpart of events detected by LIGO/VIRGO gravitational wave detectors (FNAL LDRD) Modest CPU needs, but want 5-10 hour turnaround Burst activity driven entirely by physical phenomena (gravitational wave events are transient) Rapid provisioning to peak CMS Monte Carlo Simulation Generation (and detector simulation, digitization, reconstruction) of simulated events in time for Moriond conference. 58,000 compute cores, steady-state Demonstrates scalability Received AWS research grant Initial HEPCloud Use Cases
  • 47. Results from the CMS Use Case • All CMS simulation requests fulfilled by the conference deadline (Rencontres de Moriond 2016 ) – 2.9 million jobs, 15.1 million wall hours • 9.5% badput – includes preemption from spot pricing • 87% CPU efficiency – 518 million events generated
  • 48. CMS Reaching ~60k slots on AWS with HEPCloud 10% Test 25% 60000 slots 10000 VM Each color corresponds to a different region / zone /machine type
  • 49. HEPCloud AWS: 25% of CMS global capacity Production Analysis Reprocessing Production on AWS via FNAL HEPCloud Production Analysis Reprocessing Production on AWS via FNAL HEPCloud
  • 50. On-premises vs. cloud cost comparison Average cost per core-hour • On-premises resource: 0.9 cents per core-hour • Includes power, cooling, staff, but assumes 100% utilization • Off-premises at AWS (CMS use case): 1.4 cents per core-hour • Off-premises at AWS (NOvA use case): 3.0 cents per core-hour • Use case demanded bigger VM Benchmarks • Specialized (“ttbar”) benchmark focused on HEP workflows • On-premises: 0.0163 ttbar /s (higher = better) • Off-premises: 0.0158 ttbar /s Raw compute performance roughly equivalent Cloud costs approaching equivalence Amazon provisions/retires 60k cores for our system in ~1 hour
  • 51. Acknowledgements The support from the Computing Sector The Fermilab HEPCloud Facility team AWS and their engagement team, in particular Jamie Baker The HTCondor team The collaboration and contributions from KISTI, in particular Dr. Seo-Young Noh The Illinois Institute of Technology (IIT) students and professors Ioan Raicu and Shangping Ren The Italian National Institute of Nuclear Physics (INFN) summer student program • NOvA: http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5774 • CMS: http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=5750 For More Information:
  • 53.
  • 56. Related Sessions CMP201 - Auto Scaling – The Fleet Management Solution for Planet Earth