SlideShare a Scribd company logo
1 of 49
Download to read offline
High Throughput Computing, AWS and the God Particle:
Finding New Sub-Atomic Particles on the AWS Cloud
Jamie Kinney (Sr. Manager Scientific Computing, AWS)
Miron Livny (Professor of Computer Science, University of Wisconsin)
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Jamie Kinney
First, Some Background
Amazon EC2 Instance Types
Standard (m1,m3)
Micro (t1)
High Memory (m2)
High CPU (c1)
Cluster Compute
Intel Nehalem (cc1.4xlarge)
Intel Sandy Bridge E5-2670 (cc2.8xlarge)
Sandy Bridge, NUMA, 240GB RAM (cr1.4xlarge)
NVIDIA GRID GPU a.k.a. “Kepler” (g2.2xlarge)
2TB of SSD 120,000 IOPS (hi1.4xlarge)
48 TB of ephemeral storage (hs1.8xlarge)
Multiple Purchase Models
Free Tier

On Demand

Reserved

Spot

Get started on
AWS with free
usage & no
commitment

Pay for compute
capacity by the
hour with no longterm commitments

Make a low, onetime payment and
receive a significant
discount on the
hourly charge

Bid for unused
capacity, charged at
a Spot Price which
fluctuates based on
supply and demand

Launch instances
within Amazon VPC
that run on hardware
dedicated to a single
customer

For POCs and
getting started

For spiky
workloads,
or to define needs

For committed
utilization

For time-insensitive
or transient
workloads

For highly sensitive
or compliance
related workloads

Dedicated
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances
• Priced to deliver up to 92% discount off of On-Demand Instance
–

•
•
•
•

$2.40/hour vs. $0.253/hour* for cc2.8xlarge in us-west-2

Elastic
Potential to get capacity not otherwise available
Minimum Commitment (1 hour)
Caveat - potential for interruption

* as of November 8th
Miron Livny
Armed with 5σ significance delivered by more than 6K scientists
from the ATLAS and CMS experiments, the Director General of
CERN, Rolf Heuer, asked on July 4, 2012:

“I think we have it, do you agree?”
“We have now found the missing cornerstone of particle physics.
We have a discovery. We have observed a new particle that is
consistent with a Higgs boson.”
“only possible because of the extraordinary performance of the
accelerators, experiments and the computing grid.”
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
High Energy Physics has been
a perfect (and challenging!)
example of High Throughput
Computing – an endless
stream of independent but
interrelated jobs
In 1996 I introduced the distinction between High

Performance Computing (HPC) and High
Throughput Computing (HTC) in a seminar at the NASA
Goddard Flight Center in and a month later at the European
Laboratory for Particle Physics (CERN).
High Throughput Computing
is a 24-7-365 activity and
therefore requires
automation
FLOPY ≠ (60*60*24*7*52)*FLOPS
HTCondors
“The members of the Open Science Grid (OSG) are united by a
commitment to promote the adoption and to advance the state of
the art of distributed high throughput computing (DHTC)
– shared utilization of autonomous resources where all the
elements are optimized for maximizing computational
throughput.”
OSG in numbers: 2M core hours and 1 PB per
day on 120 US sites. 60% of the core hours are
used by the LHC experiments (ATLAS & CMS)
Submit Locally and run
Globally
Here is my work and here the resources
(local cluster or money) that I bring to
table
HTCondor uses a two phase
matchmaking process to first
allocate a resource to a
requestor and then to select a
task to be delegated to the
resource
Match!
Wi
Wi
Wi
Wi

SchedD

I am S and
MM
am looking
for a
W3
resource

StartD
I am D and
I am willing
to offer you
a resource
Since the HTCondor SchedD
can also submit (via grid CEs
or SSH) jobs to remote batch
systems we can do the
following -
Local

User Code/
DAGMan

HTCondor

MM

MM

HTCondor

Factory
Front End

SchedD
Grid CE

Grid CE

Grid CE

LSF

PBS

MM
HTCondor

G-app
StartD

Remote

C-app

G-app
StartD

G-app
StartD

C-app

C-app

C-app

MM
HTCondor

C-app

OSG
Factory

SchedD
The OSG GildeIn factory uses the
SchedD as a resource provisioning agent
on behalf of the (local) SchedD. It
decides when, from where and for how
long to keep an acquired resource.
Since the HTCondor SchedD
can also manage VMs on
remote clouds (e.g. AWS &
Spot), the OSG factory can
also do the following -
Local

User Code/
DAGMan

HTCondor

MM

MM

Factory
Front End

SchedD
EC2

OpSt

Spot

VM

VM

VM

StartD

Remote

C-app

HTCondor

StartD

StartD

C-app

C-app

C-app

MM
HTCondor

C-app

OSG
Cloud
Factory

SchedD
This (natural) potential of adding
AWS resources to the OSG triggered
the following exploratory efforts by
ATLAS (John Hover from BNL ) and
CMS (Dan Bradley from UWMadison)
Benchmarked a variety of EC2
instance types with a standard
HEP benchmark (HepSpec06)
Machine

HS06

HS06
stddev

Cores

$/kHS06-hour
(spot)

HS06/core

$/kHS06-hour
(on-demand)

m1.medium

10

1.3

1

10

1.3

13

m1.large

20

2.4

2

10

1.3

13

m1.xlarge

39

7.0

4

10

1.3

14

m2.xlarge

28

1.1

2

14

1.3

16

m2.2xlarge

55

0.4

4

14

1.3

17

m2.4xlarge

98

2.5

8

12

1.4

18

m3.xlarge

48

0.7

4

12

1.2

12

m3.2xlarge

91

1.8

8

11

1.3

13

cc1.4xlarge

139

0.3

16

9

1.5

9.3

cc2.8xlarge

285

8.1

32

9

1.0

8.4

Prices and benchmarks in us-east-1 zone (N. Virginia), Nov 2012.
Dan Bradley

35
Budget – 10k x 10HS06, one week
Quantity

Expected
Cost

200

$8.5k

1199

$12k

168TB

$13k

Instances:
cc2.8xlarge, us-west-2
m3.xlarge, us-east-1
Output Transfer
Total

$33k

Assumptions:
• Need 15% extra time due to instance termination
• Transfer out 0.01GB/HS06-hour (no direct connect)
Dan Bradley

$2.0/kHS06-hour

36
Two Trail Runs of Cmsprod
1. 3 cores for one month
2. 100 cores for one week
• Attached EC2 VMs to T2_US_Wisconsin
–
–
–
–

Output  Wisconsin SE
Cmsprod Glideins  Wisconsin CE
Cmssoft  cvmfs
Frontier and cvmfs caches  Wisconsin squids (2)

Dan Bradley

38
Simple Purchasing Strategy
• Bid $0.03/core-hour
– price is typically about half that
– (Note: this bid does not include bandwidth cost)

• Used mix of m1.medium and m1.large instances
– m1.medium: 1 core, 3.75GB RAM
– m1.large: 2 cores, 7.5GB RAM

• Used us-east-1 region (N. Virginia)
– No preference for zone within region (there are 3)
Dan Bradley

39
Results: Cost
• Total cost: $0.035/T2-core-hour
– (for equivalent work done/hour in T2_US_Wisconsin)
– In terms of HS06: $2.6/kHS06-hour

• 55% of cost was for the machine
– Price: $0.0131/core-hour

• 45% of cost was for data transfer
– Price: $0.12/GB out (input is currently free)
– jobs produced 0.1GB/hour
• (this likely included merge jobs – not smart to run them in cloud!)

– At higher volumes, price/GB is lower
• e.g. at 100TB/month, price is $0.07/GB

Dan Bradley

40
Scalability (and stability)
Elastic Cluster: Components
Static HTCondor central manager
•
Standalone, used only for Cloud work
AutoPyFactory (APF) configured with two queues
•
One observes a Panda queue, when jobs are activated,
submits pilots to local cluster Condor queue.
•
Another observes the local Condor pool. When jobs are
Idle, submits WN VMs to IaaS (up to some limit). When
WNs are Unclaimed, shuts them down.
Worker Node VMs
•
Generic Condor startds associated connect back to
local Condor cluster. All VMs are identical, don’t need
public IPs, and don't need to know about each other.
•
CVMFS software access
Panda Site:
•
Associated with static BNL SE, LFC, etc.
Condor Scaling 1
RACF received a $50K grant from AWS: Great opportunity to test:
• Condor scaling to thousands of nodes over WAN
• Empirically determine costs

Naïve Approach:
•
•
•
•

Single Condor host (schedd, collector, etc.)
Single process for each daemon
Password authentication
Condor Connection Broker (CCB)

Result: Maxed out at ~3,000 nodes
•
•
•
•

Collector load causing timeouts of schedd daemon
CCB overload?
Network connections exceeding open file limits
Collector duty cycle -> .99
Condor Scaling 2
Refined approach:
•
•
•
•
•

Tune OS limits: 1M open files, 65K max processes
Split schedd from (collector, negotiator, CCB)
Run 20 collector processes. Startds randomly choose one. Enable collector reporting sub-collectors
report to non-public collector
Enable shared port daemon on all nodes: multiplexes TCP connections. Results in dozens of
connections rather than thousands.
Enable session auth, so that connections after the first bypass password auth check.

Result:
•
•
•

Smooth operations up to 5,000 startds, even with large bursts
No disruption of schedd operation on other host
Collector duty cycle ~.35. Substantial headroom left. Switching to 7-slot startds would get us to
~35,000 slots, with marginal additional load.
Condor Scaling 3
Overall results:
•
•
•
•
•

Ran ~5,000 nodes for several weeks
Production simulation jobs. Stageout to BNL.
Spent approximately $13K. Only $750 was for data transfer
Moderate failure rate due to spot terminations.
Actual spot price paid very close to baseline, e.g. still less than $0.01/hr
for m1.small
• No solid statistics on efficiency/cost yet, beyond a rough appearance of
"competitive"
Clean “Separation” of the
StartD from a HTCondor pool
• Spot Instance reclaimed by AWS due to
increase in Spot Price – detect “shutdown”
signal and make good use of the time until
“unplugged”
• On demand instances released by owner when
replaced by a Spot Instance – bring
computation(s) to a “safe” state and maximize
return on investment
Who else is using this
approach?
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
ESA Gaia Mission Overview
•

ESA’s Gaia is an ambitious mission to chart a three-dimensional map of the
Milky Way Galaxy in order to reveal the composition, formation and
evolution of our galaxy.

•

Gaia will repeatedly analyze and record the positions and magnitude of
approximately one billion stars over the course of several years.

•

1 billion stars x 80 observations x 10 readouts = ~1 x 10^12 samples.

•

1ms processing time/sample = more than 30 years of processing
Multiwavelength Atlas of the Galactic Plane

•
•
•
•
•
•

Collaboration between AWS, Caltech/IPAC and USC/ISI
All images are publicly accessible via direct download and VAO APIs
16 wavelength infrared atlas spanning 1µm to 70µm
Datasets from GLIMPSE and MIPSGAL, 2MASS, MSX, WISE
Spatial sampling of 1 arcsec with ±180° longitude and ±20° latitude
Mosaics generated by Montage (http://montage.ipac.caltech.edu)
running on HTCondor
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Please give us your feedback on this
presentation

BDT402
As a thank you, we will select prize
winners daily for completed surveys!

More Related Content

What's hot

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learninginside-BigData.com
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Amazon Web Services
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudAccubits Technologies
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...NECST Lab @ Politecnico di Milano
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorNECST Lab @ Politecnico di Milano
 
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...Jürgen Ambrosi
 
GPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and ContainerGPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and ContainerAndrew Yongjoon Kong
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackAndrew Yongjoon Kong
 
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...Jürgen Ambrosi
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingDeepak Singh
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAndrew Yongjoon Kong
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorIgor Sfiligoi
 

What's hot (20)

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
openstack, devops and people
openstack, devops and peopleopenstack, devops and people
openstack, devops and people
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisor
 
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
 
GPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and ContainerGPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and Container
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
 
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestrator
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondor
 

Similar to Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013

Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWSAmazon Web Services
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAmazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesJamie Kinney
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudThe UberCloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudWolfgang Gentzsch
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019Amazon Web Services
 

Similar to Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013 (20)

Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Kubernetes: My BFF
Kubernetes: My BFFKubernetes: My BFF
Kubernetes: My BFF
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 

Recently uploaded (20)

CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 

Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013

  • 1. High Throughput Computing, AWS and the God Particle: Finding New Sub-Atomic Particles on the AWS Cloud Jamie Kinney (Sr. Manager Scientific Computing, AWS) Miron Livny (Professor of Computer Science, University of Wisconsin) November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 5. Standard (m1,m3) Micro (t1) High Memory (m2) High CPU (c1)
  • 6. Cluster Compute Intel Nehalem (cc1.4xlarge) Intel Sandy Bridge E5-2670 (cc2.8xlarge) Sandy Bridge, NUMA, 240GB RAM (cr1.4xlarge) NVIDIA GRID GPU a.k.a. “Kepler” (g2.2xlarge) 2TB of SSD 120,000 IOPS (hi1.4xlarge) 48 TB of ephemeral storage (hs1.8xlarge)
  • 7. Multiple Purchase Models Free Tier On Demand Reserved Spot Get started on AWS with free usage & no commitment Pay for compute capacity by the hour with no longterm commitments Make a low, onetime payment and receive a significant discount on the hourly charge Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand Launch instances within Amazon VPC that run on hardware dedicated to a single customer For POCs and getting started For spiky workloads, or to define needs For committed utilization For time-insensitive or transient workloads For highly sensitive or compliance related workloads Dedicated
  • 8. Amazon EC2 Spot Instances
  • 9. Amazon EC2 Spot Instances
  • 10. Amazon EC2 Spot Instances • Priced to deliver up to 92% discount off of On-Demand Instance – • • • • $2.40/hour vs. $0.253/hour* for cc2.8xlarge in us-west-2 Elastic Potential to get capacity not otherwise available Minimum Commitment (1 hour) Caveat - potential for interruption * as of November 8th
  • 12. Armed with 5σ significance delivered by more than 6K scientists from the ATLAS and CMS experiments, the Director General of CERN, Rolf Heuer, asked on July 4, 2012: “I think we have it, do you agree?” “We have now found the missing cornerstone of particle physics. We have a discovery. We have observed a new particle that is consistent with a Higgs boson.” “only possible because of the extraordinary performance of the accelerators, experiments and the computing grid.”
  • 15. High Energy Physics has been a perfect (and challenging!) example of High Throughput Computing – an endless stream of independent but interrelated jobs
  • 16. In 1996 I introduced the distinction between High Performance Computing (HPC) and High Throughput Computing (HTC) in a seminar at the NASA Goddard Flight Center in and a month later at the European Laboratory for Particle Physics (CERN).
  • 17. High Throughput Computing is a 24-7-365 activity and therefore requires automation FLOPY ≠ (60*60*24*7*52)*FLOPS
  • 19. “The members of the Open Science Grid (OSG) are united by a commitment to promote the adoption and to advance the state of the art of distributed high throughput computing (DHTC) – shared utilization of autonomous resources where all the elements are optimized for maximizing computational throughput.”
  • 20. OSG in numbers: 2M core hours and 1 PB per day on 120 US sites. 60% of the core hours are used by the LHC experiments (ATLAS & CMS)
  • 21. Submit Locally and run Globally Here is my work and here the resources (local cluster or money) that I bring to table
  • 22. HTCondor uses a two phase matchmaking process to first allocate a resource to a requestor and then to select a task to be delegated to the resource
  • 23. Match! Wi Wi Wi Wi SchedD I am S and MM am looking for a W3 resource StartD I am D and I am willing to offer you a resource
  • 24. Since the HTCondor SchedD can also submit (via grid CEs or SSH) jobs to remote batch systems we can do the following -
  • 25. Local User Code/ DAGMan HTCondor MM MM HTCondor Factory Front End SchedD Grid CE Grid CE Grid CE LSF PBS MM HTCondor G-app StartD Remote C-app G-app StartD G-app StartD C-app C-app C-app MM HTCondor C-app OSG Factory SchedD
  • 26. The OSG GildeIn factory uses the SchedD as a resource provisioning agent on behalf of the (local) SchedD. It decides when, from where and for how long to keep an acquired resource.
  • 27. Since the HTCondor SchedD can also manage VMs on remote clouds (e.g. AWS & Spot), the OSG factory can also do the following -
  • 29. This (natural) potential of adding AWS resources to the OSG triggered the following exploratory efforts by ATLAS (John Hover from BNL ) and CMS (Dan Bradley from UWMadison)
  • 30. Benchmarked a variety of EC2 instance types with a standard HEP benchmark (HepSpec06)
  • 32. Budget – 10k x 10HS06, one week Quantity Expected Cost 200 $8.5k 1199 $12k 168TB $13k Instances: cc2.8xlarge, us-west-2 m3.xlarge, us-east-1 Output Transfer Total $33k Assumptions: • Need 15% extra time due to instance termination • Transfer out 0.01GB/HS06-hour (no direct connect) Dan Bradley $2.0/kHS06-hour 36
  • 33. Two Trail Runs of Cmsprod
  • 34. 1. 3 cores for one month 2. 100 cores for one week • Attached EC2 VMs to T2_US_Wisconsin – – – – Output  Wisconsin SE Cmsprod Glideins  Wisconsin CE Cmssoft  cvmfs Frontier and cvmfs caches  Wisconsin squids (2) Dan Bradley 38
  • 35. Simple Purchasing Strategy • Bid $0.03/core-hour – price is typically about half that – (Note: this bid does not include bandwidth cost) • Used mix of m1.medium and m1.large instances – m1.medium: 1 core, 3.75GB RAM – m1.large: 2 cores, 7.5GB RAM • Used us-east-1 region (N. Virginia) – No preference for zone within region (there are 3) Dan Bradley 39
  • 36. Results: Cost • Total cost: $0.035/T2-core-hour – (for equivalent work done/hour in T2_US_Wisconsin) – In terms of HS06: $2.6/kHS06-hour • 55% of cost was for the machine – Price: $0.0131/core-hour • 45% of cost was for data transfer – Price: $0.12/GB out (input is currently free) – jobs produced 0.1GB/hour • (this likely included merge jobs – not smart to run them in cloud!) – At higher volumes, price/GB is lower • e.g. at 100TB/month, price is $0.07/GB Dan Bradley 40
  • 38. Elastic Cluster: Components Static HTCondor central manager • Standalone, used only for Cloud work AutoPyFactory (APF) configured with two queues • One observes a Panda queue, when jobs are activated, submits pilots to local cluster Condor queue. • Another observes the local Condor pool. When jobs are Idle, submits WN VMs to IaaS (up to some limit). When WNs are Unclaimed, shuts them down. Worker Node VMs • Generic Condor startds associated connect back to local Condor cluster. All VMs are identical, don’t need public IPs, and don't need to know about each other. • CVMFS software access Panda Site: • Associated with static BNL SE, LFC, etc.
  • 39. Condor Scaling 1 RACF received a $50K grant from AWS: Great opportunity to test: • Condor scaling to thousands of nodes over WAN • Empirically determine costs Naïve Approach: • • • • Single Condor host (schedd, collector, etc.) Single process for each daemon Password authentication Condor Connection Broker (CCB) Result: Maxed out at ~3,000 nodes • • • • Collector load causing timeouts of schedd daemon CCB overload? Network connections exceeding open file limits Collector duty cycle -> .99
  • 40. Condor Scaling 2 Refined approach: • • • • • Tune OS limits: 1M open files, 65K max processes Split schedd from (collector, negotiator, CCB) Run 20 collector processes. Startds randomly choose one. Enable collector reporting sub-collectors report to non-public collector Enable shared port daemon on all nodes: multiplexes TCP connections. Results in dozens of connections rather than thousands. Enable session auth, so that connections after the first bypass password auth check. Result: • • • Smooth operations up to 5,000 startds, even with large bursts No disruption of schedd operation on other host Collector duty cycle ~.35. Substantial headroom left. Switching to 7-slot startds would get us to ~35,000 slots, with marginal additional load.
  • 41. Condor Scaling 3 Overall results: • • • • • Ran ~5,000 nodes for several weeks Production simulation jobs. Stageout to BNL. Spent approximately $13K. Only $750 was for data transfer Moderate failure rate due to spot terminations. Actual spot price paid very close to baseline, e.g. still less than $0.01/hr for m1.small • No solid statistics on efficiency/cost yet, beyond a rough appearance of "competitive"
  • 42. Clean “Separation” of the StartD from a HTCondor pool
  • 43. • Spot Instance reclaimed by AWS due to increase in Spot Price – detect “shutdown” signal and make good use of the time until “unplugged” • On demand instances released by owner when replaced by a Spot Instance – bring computation(s) to a “safe” state and maximize return on investment
  • 44. Who else is using this approach?
  • 46. ESA Gaia Mission Overview • ESA’s Gaia is an ambitious mission to chart a three-dimensional map of the Milky Way Galaxy in order to reveal the composition, formation and evolution of our galaxy. • Gaia will repeatedly analyze and record the positions and magnitude of approximately one billion stars over the course of several years. • 1 billion stars x 80 observations x 10 readouts = ~1 x 10^12 samples. • 1ms processing time/sample = more than 30 years of processing
  • 47. Multiwavelength Atlas of the Galactic Plane • • • • • • Collaboration between AWS, Caltech/IPAC and USC/ISI All images are publicly accessible via direct download and VAO APIs 16 wavelength infrared atlas spanning 1µm to 70µm Datasets from GLIMPSE and MIPSGAL, 2MASS, MSX, WISE Spatial sampling of 1 arcsec with ±180° longitude and ±20° latitude Mosaics generated by Montage (http://montage.ipac.caltech.edu) running on HTCondor
  • 49. Please give us your feedback on this presentation BDT402 As a thank you, we will select prize winners daily for completed surveys!