SlideShare a Scribd company logo
High Throughput Computing, AWS and the God Particle:
Finding New Sub-Atomic Particles on the AWS Cloud
Jamie Kinney (Sr. Manager Scientific Computing, AWS)
Miron Livny (Professor of Computer Science, University of Wisconsin)
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Jamie Kinney
First, Some Background
Amazon EC2 Instance Types
Standard (m1,m3)
Micro (t1)
High Memory (m2)
High CPU (c1)
Cluster Compute
Intel Nehalem (cc1.4xlarge)
Intel Sandy Bridge E5-2670 (cc2.8xlarge)
Sandy Bridge, NUMA, 240GB RAM (cr1.4xlarge)
NVIDIA GRID GPU a.k.a. “Kepler” (g2.2xlarge)
2TB of SSD 120,000 IOPS (hi1.4xlarge)
48 TB of ephemeral storage (hs1.8xlarge)
Multiple Purchase Models
Free Tier

On Demand

Reserved

Spot

Get started on
AWS with free
usage & no
commitment

Pay for compute
capacity by the
hour with no longterm commitments

Make a low, onetime payment and
receive a significant
discount on the
hourly charge

Bid for unused
capacity, charged at
a Spot Price which
fluctuates based on
supply and demand

Launch instances
within Amazon VPC
that run on hardware
dedicated to a single
customer

For POCs and
getting started

For spiky
workloads,
or to define needs

For committed
utilization

For time-insensitive
or transient
workloads

For highly sensitive
or compliance
related workloads

Dedicated
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances
• Priced to deliver up to 92% discount off of On-Demand Instance
–

•
•
•
•

$2.40/hour vs. $0.253/hour* for cc2.8xlarge in us-west-2

Elastic
Potential to get capacity not otherwise available
Minimum Commitment (1 hour)
Caveat - potential for interruption

* as of November 8th
Miron Livny
Armed with 5σ significance delivered by more than 6K scientists
from the ATLAS and CMS experiments, the Director General of
CERN, Rolf Heuer, asked on July 4, 2012:

“I think we have it, do you agree?”
“We have now found the missing cornerstone of particle physics.
We have a discovery. We have observed a new particle that is
consistent with a Higgs boson.”
“only possible because of the extraordinary performance of the
accelerators, experiments and the computing grid.”
High Energy Physics has been
a perfect (and challenging!)
example of High Throughput
Computing – an endless
stream of independent but
interrelated jobs
In 1996 I introduced the distinction between High

Performance Computing (HPC) and High
Throughput Computing (HTC) in a seminar at the NASA
Goddard Flight Center in and a month later at the European
Laboratory for Particle Physics (CERN).
High Throughput Computing
is a 24-7-365 activity and
therefore requires
automation
FLOPY ≠ (60*60*24*7*52)*FLOPS
HTCondors
“The members of the Open Science Grid (OSG) are united by a
commitment to promote the adoption and to advance the state of
the art of distributed high throughput computing (DHTC)
– shared utilization of autonomous resources where all the
elements are optimized for maximizing computational
throughput.”
OSG in numbers: 2M core hours and 1 PB per
day on 120 US sites. 60% of the core hours are
used by the LHC experiments (ATLAS & CMS)
Submit Locally and run
Globally
Here is my work and here the resources
(local cluster or money) that I bring to
table
HTCondor uses a two phase
matchmaking process to first
allocate a resource to a
requestor and then to select a
task to be delegated to the
resource
Match!
Wi
Wi
Wi
Wi

SchedD

I am S and
MM
am looking
for a
W3
resource

StartD
I am D and
I am willing
to offer you
a resource
Since the HTCondor SchedD
can also submit (via grid CEs
or SSH) jobs to remote batch
systems we can do the
following -
Local

User Code/
DAGMan

HTCondor

MM

MM

HTCondor

Factory
Front End

SchedD
Grid CE

Grid CE

Grid CE

LSF

PBS

MM
HTCondor

G-app
StartD

Remote

C-app

G-app
StartD

G-app
StartD

C-app

C-app

C-app

MM
HTCondor

C-app

OSG
Factory

SchedD
The OSG GildeIn factory uses the
SchedD as a resource provisioning agent
on behalf of the (local) SchedD. It
decides when, from where and for how
long to keep an acquired resource.
Since the HTCondor SchedD
can also manage VMs on
remote clouds (e.g. AWS &
Spot), the OSG factory can
also do the following -
Local

User Code/
DAGMan

HTCondor

MM

MM

Factory
Front End

SchedD
EC2

OpSt

Spot

VM

VM

VM

StartD

Remote

C-app

HTCondor

StartD

StartD

C-app

C-app

C-app

MM
HTCondor

C-app

OSG
Cloud
Factory

SchedD
This (natural) potential of adding
AWS resources to the OSG triggered
the following exploratory efforts by
ATLAS (John Hover from BNL ) and
CMS (Dan Bradley from UWMadison)
Benchmarked a variety of EC2
instance types with a standard
HEP benchmark (HepSpec06)
Machine

HS06

HS06
stddev

Cores

$/kHS06-hour
(spot)

HS06/core

$/kHS06-hour
(on-demand)

m1.medium

10

1.3

1

10

1.3

13

m1.large

20

2.4

2

10

1.3

13

m1.xlarge

39

7.0

4

10

1.3

14

m2.xlarge

28

1.1

2

14

1.3

16

m2.2xlarge

55

0.4

4

14

1.3

17

m2.4xlarge

98

2.5

8

12

1.4

18

m3.xlarge

48

0.7

4

12

1.2

12

m3.2xlarge

91

1.8

8

11

1.3

13

cc1.4xlarge

139

0.3

16

9

1.5

9.3

cc2.8xlarge

285

8.1

32

9

1.0

8.4

Prices and benchmarks in us-east-1 zone (N. Virginia), Nov 2012.
Dan Bradley

35
Budget – 10k x 10HS06, one week
Quantity

Expected
Cost

200

$8.5k

1199

$12k

168TB

$13k

Instances:
cc2.8xlarge, us-west-2
m3.xlarge, us-east-1
Output Transfer
Total

$33k

Assumptions:
• Need 15% extra time due to instance termination
• Transfer out 0.01GB/HS06-hour (no direct connect)
Dan Bradley

$2.0/kHS06-hour

36
Two Trail Runs of Cmsprod
1. 3 cores for one month
2. 100 cores for one week
• Attached EC2 VMs to T2_US_Wisconsin
–
–
–
–

Output  Wisconsin SE
Cmsprod Glideins  Wisconsin CE
Cmssoft  cvmfs
Frontier and cvmfs caches  Wisconsin squids (2)

Dan Bradley

38
Simple Purchasing Strategy
• Bid $0.03/core-hour
– price is typically about half that
– (Note: this bid does not include bandwidth cost)

• Used mix of m1.medium and m1.large instances
– m1.medium: 1 core, 3.75GB RAM
– m1.large: 2 cores, 7.5GB RAM

• Used us-east-1 region (N. Virginia)
– No preference for zone within region (there are 3)
Dan Bradley

39
Results: Cost
• Total cost: $0.035/T2-core-hour
– (for equivalent work done/hour in T2_US_Wisconsin)
– In terms of HS06: $2.6/kHS06-hour

• 55% of cost was for the machine
– Price: $0.0131/core-hour

• 45% of cost was for data transfer
– Price: $0.12/GB out (input is currently free)
– jobs produced 0.1GB/hour
• (this likely included merge jobs – not smart to run them in cloud!)

– At higher volumes, price/GB is lower
• e.g. at 100TB/month, price is $0.07/GB

Dan Bradley

40
Scalability (and stability)
Elastic Cluster: Components
Static HTCondor central manager
•
Standalone, used only for Cloud work
AutoPyFactory (APF) configured with two queues
•
One observes a Panda queue, when jobs are activated,
submits pilots to local cluster Condor queue.
•
Another observes the local Condor pool. When jobs are
Idle, submits WN VMs to IaaS (up to some limit). When
WNs are Unclaimed, shuts them down.
Worker Node VMs
•
Generic Condor startds associated connect back to
local Condor cluster. All VMs are identical, don’t need
public IPs, and don't need to know about each other.
•
CVMFS software access
Panda Site:
•
Associated with static BNL SE, LFC, etc.
Condor Scaling 1
RACF received a $50K grant from AWS: Great opportunity to test:
• Condor scaling to thousands of nodes over WAN
• Empirically determine costs

Naïve Approach:
•
•
•
•

Single Condor host (schedd, collector, etc.)
Single process for each daemon
Password authentication
Condor Connection Broker (CCB)

Result: Maxed out at ~3,000 nodes
•
•
•
•

Collector load causing timeouts of schedd daemon
CCB overload?
Network connections exceeding open file limits
Collector duty cycle -> .99
Condor Scaling 2
Refined approach:
•
•
•
•
•

Tune OS limits: 1M open files, 65K max processes
Split schedd from (collector, negotiator, CCB)
Run 20 collector processes. Startds randomly choose one. Enable collector reporting sub-collectors
report to non-public collector
Enable shared port daemon on all nodes: multiplexes TCP connections. Results in dozens of
connections rather than thousands.
Enable session auth, so that connections after the first bypass password auth check.

Result:
•
•
•

Smooth operations up to 5,000 startds, even with large bursts
No disruption of schedd operation on other host
Collector duty cycle ~.35. Substantial headroom left. Switching to 7-slot startds would get us to
~35,000 slots, with marginal additional load.
Condor Scaling 3
Overall results:
•
•
•
•
•

Ran ~5,000 nodes for several weeks
Production simulation jobs. Stageout to BNL.
Spent approximately $13K. Only $750 was for data transfer
Moderate failure rate due to spot terminations.
Actual spot price paid very close to baseline, e.g. still less than $0.01/hr
for m1.small
• No solid statistics on efficiency/cost yet, beyond a rough appearance of
"competitive"
Clean “Separation” of the
StartD from a HTCondor pool
• Spot Instance reclaimed by AWS due to
increase in Spot Price – detect “shutdown”
signal and make good use of the time until
“unplugged”
• On demand instances released by owner when
replaced by a Spot Instance – bring
computation(s) to a “safe” state and maximize
return on investment
Who else is using this
approach?
ESA Gaia Mission Overview
•

ESA’s Gaia is an ambitious mission to chart a three-dimensional map of the
Milky Way Galaxy in order to reveal the composition, formation and
evolution of our galaxy.

•

Gaia will repeatedly analyze and record the positions and magnitude of
approximately one billion stars over the course of several years.

•

1 billion stars x 80 observations x 10 readouts = ~1 x 10^12 samples.

•

1ms processing time/sample = more than 30 years of processing
Multiwavelength Atlas of the Galactic Plane

•
•
•
•
•
•

Collaboration between AWS, Caltech/IPAC and USC/ISI
All images are publicly accessible via direct download and VAO APIs
16 wavelength infrared atlas spanning 1µm to 70µm
Datasets from GLIMPSE and MIPSGAL, 2MASS, MSX, WISE
Spatial sampling of 1 arcsec with ±180° longitude and ±20° latitude
Mosaics generated by Montage (http://montage.ipac.caltech.edu)
running on HTCondor
Please give us your feedback on this
presentation

BDT402
As a thank you, we will select prize
winners daily for completed surveys!

More Related Content

What's hot

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
inside-BigData.com
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Amazon Web Services
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
Amazon Web Services
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
Igor Sfiligoi
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
Accubits Technologies
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
Julien SIMON
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
NECST Lab @ Politecnico di Milano
 
openstack, devops and people
openstack, devops and peopleopenstack, devops and people
openstack, devops and people
Andrew Yongjoon Kong
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
Igor Sfiligoi
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisor
NECST Lab @ Politecnico di Milano
 
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
Jürgen Ambrosi
 
GPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and ContainerGPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and Container
Andrew Yongjoon Kong
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Andrew Yongjoon Kong
 
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
Jürgen Ambrosi
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
Deepak Singh
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
NAVER D2
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestrator
Andrew Yongjoon Kong
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondor
Igor Sfiligoi
 

What's hot (20)

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
openstack, devops and people
openstack, devops and peopleopenstack, devops and people
openstack, devops and people
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
A performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisor
 
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...3  Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
3 Sessione - Come superare il problema delle risorse nell’utilizzo di softwa...
 
GPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and ContainerGPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and Container
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
 
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestrator
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondor
 

Similar to Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013

Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
inside-BigData.com
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
Amazon Web Services
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Amazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
Amazon Web Services
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
Amazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
Amazon Web Services
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
Amazon Web Services
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
Amazon Web Services
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
Amazon Web Services
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
Jamie Kinney
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
Amazon Web Services
 
Kubernetes: My BFF
Kubernetes: My BFFKubernetes: My BFF
Kubernetes: My BFF
Jonathan Yu, P.Eng.
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
The UberCloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
Wolfgang Gentzsch
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019
Amazon Web Services
 

Similar to Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013 (20)

Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Kubernetes: My BFF
Kubernetes: My BFFKubernetes: My BFF
Kubernetes: My BFF
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013

  • 1. High Throughput Computing, AWS and the God Particle: Finding New Sub-Atomic Particles on the AWS Cloud Jamie Kinney (Sr. Manager Scientific Computing, AWS) Miron Livny (Professor of Computer Science, University of Wisconsin) November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 5. Standard (m1,m3) Micro (t1) High Memory (m2) High CPU (c1)
  • 6. Cluster Compute Intel Nehalem (cc1.4xlarge) Intel Sandy Bridge E5-2670 (cc2.8xlarge) Sandy Bridge, NUMA, 240GB RAM (cr1.4xlarge) NVIDIA GRID GPU a.k.a. “Kepler” (g2.2xlarge) 2TB of SSD 120,000 IOPS (hi1.4xlarge) 48 TB of ephemeral storage (hs1.8xlarge)
  • 7. Multiple Purchase Models Free Tier On Demand Reserved Spot Get started on AWS with free usage & no commitment Pay for compute capacity by the hour with no longterm commitments Make a low, onetime payment and receive a significant discount on the hourly charge Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand Launch instances within Amazon VPC that run on hardware dedicated to a single customer For POCs and getting started For spiky workloads, or to define needs For committed utilization For time-insensitive or transient workloads For highly sensitive or compliance related workloads Dedicated
  • 8. Amazon EC2 Spot Instances
  • 9. Amazon EC2 Spot Instances
  • 10. Amazon EC2 Spot Instances • Priced to deliver up to 92% discount off of On-Demand Instance – • • • • $2.40/hour vs. $0.253/hour* for cc2.8xlarge in us-west-2 Elastic Potential to get capacity not otherwise available Minimum Commitment (1 hour) Caveat - potential for interruption * as of November 8th
  • 12. Armed with 5σ significance delivered by more than 6K scientists from the ATLAS and CMS experiments, the Director General of CERN, Rolf Heuer, asked on July 4, 2012: “I think we have it, do you agree?” “We have now found the missing cornerstone of particle physics. We have a discovery. We have observed a new particle that is consistent with a Higgs boson.” “only possible because of the extraordinary performance of the accelerators, experiments and the computing grid.”
  • 13.
  • 14.
  • 15. High Energy Physics has been a perfect (and challenging!) example of High Throughput Computing – an endless stream of independent but interrelated jobs
  • 16. In 1996 I introduced the distinction between High Performance Computing (HPC) and High Throughput Computing (HTC) in a seminar at the NASA Goddard Flight Center in and a month later at the European Laboratory for Particle Physics (CERN).
  • 17. High Throughput Computing is a 24-7-365 activity and therefore requires automation FLOPY ≠ (60*60*24*7*52)*FLOPS
  • 19. “The members of the Open Science Grid (OSG) are united by a commitment to promote the adoption and to advance the state of the art of distributed high throughput computing (DHTC) – shared utilization of autonomous resources where all the elements are optimized for maximizing computational throughput.”
  • 20. OSG in numbers: 2M core hours and 1 PB per day on 120 US sites. 60% of the core hours are used by the LHC experiments (ATLAS & CMS)
  • 21. Submit Locally and run Globally Here is my work and here the resources (local cluster or money) that I bring to table
  • 22. HTCondor uses a two phase matchmaking process to first allocate a resource to a requestor and then to select a task to be delegated to the resource
  • 23. Match! Wi Wi Wi Wi SchedD I am S and MM am looking for a W3 resource StartD I am D and I am willing to offer you a resource
  • 24. Since the HTCondor SchedD can also submit (via grid CEs or SSH) jobs to remote batch systems we can do the following -
  • 25. Local User Code/ DAGMan HTCondor MM MM HTCondor Factory Front End SchedD Grid CE Grid CE Grid CE LSF PBS MM HTCondor G-app StartD Remote C-app G-app StartD G-app StartD C-app C-app C-app MM HTCondor C-app OSG Factory SchedD
  • 26. The OSG GildeIn factory uses the SchedD as a resource provisioning agent on behalf of the (local) SchedD. It decides when, from where and for how long to keep an acquired resource.
  • 27. Since the HTCondor SchedD can also manage VMs on remote clouds (e.g. AWS & Spot), the OSG factory can also do the following -
  • 29. This (natural) potential of adding AWS resources to the OSG triggered the following exploratory efforts by ATLAS (John Hover from BNL ) and CMS (Dan Bradley from UWMadison)
  • 30. Benchmarked a variety of EC2 instance types with a standard HEP benchmark (HepSpec06)
  • 32. Budget – 10k x 10HS06, one week Quantity Expected Cost 200 $8.5k 1199 $12k 168TB $13k Instances: cc2.8xlarge, us-west-2 m3.xlarge, us-east-1 Output Transfer Total $33k Assumptions: • Need 15% extra time due to instance termination • Transfer out 0.01GB/HS06-hour (no direct connect) Dan Bradley $2.0/kHS06-hour 36
  • 33. Two Trail Runs of Cmsprod
  • 34. 1. 3 cores for one month 2. 100 cores for one week • Attached EC2 VMs to T2_US_Wisconsin – – – – Output  Wisconsin SE Cmsprod Glideins  Wisconsin CE Cmssoft  cvmfs Frontier and cvmfs caches  Wisconsin squids (2) Dan Bradley 38
  • 35. Simple Purchasing Strategy • Bid $0.03/core-hour – price is typically about half that – (Note: this bid does not include bandwidth cost) • Used mix of m1.medium and m1.large instances – m1.medium: 1 core, 3.75GB RAM – m1.large: 2 cores, 7.5GB RAM • Used us-east-1 region (N. Virginia) – No preference for zone within region (there are 3) Dan Bradley 39
  • 36. Results: Cost • Total cost: $0.035/T2-core-hour – (for equivalent work done/hour in T2_US_Wisconsin) – In terms of HS06: $2.6/kHS06-hour • 55% of cost was for the machine – Price: $0.0131/core-hour • 45% of cost was for data transfer – Price: $0.12/GB out (input is currently free) – jobs produced 0.1GB/hour • (this likely included merge jobs – not smart to run them in cloud!) – At higher volumes, price/GB is lower • e.g. at 100TB/month, price is $0.07/GB Dan Bradley 40
  • 38. Elastic Cluster: Components Static HTCondor central manager • Standalone, used only for Cloud work AutoPyFactory (APF) configured with two queues • One observes a Panda queue, when jobs are activated, submits pilots to local cluster Condor queue. • Another observes the local Condor pool. When jobs are Idle, submits WN VMs to IaaS (up to some limit). When WNs are Unclaimed, shuts them down. Worker Node VMs • Generic Condor startds associated connect back to local Condor cluster. All VMs are identical, don’t need public IPs, and don't need to know about each other. • CVMFS software access Panda Site: • Associated with static BNL SE, LFC, etc.
  • 39. Condor Scaling 1 RACF received a $50K grant from AWS: Great opportunity to test: • Condor scaling to thousands of nodes over WAN • Empirically determine costs Naïve Approach: • • • • Single Condor host (schedd, collector, etc.) Single process for each daemon Password authentication Condor Connection Broker (CCB) Result: Maxed out at ~3,000 nodes • • • • Collector load causing timeouts of schedd daemon CCB overload? Network connections exceeding open file limits Collector duty cycle -> .99
  • 40. Condor Scaling 2 Refined approach: • • • • • Tune OS limits: 1M open files, 65K max processes Split schedd from (collector, negotiator, CCB) Run 20 collector processes. Startds randomly choose one. Enable collector reporting sub-collectors report to non-public collector Enable shared port daemon on all nodes: multiplexes TCP connections. Results in dozens of connections rather than thousands. Enable session auth, so that connections after the first bypass password auth check. Result: • • • Smooth operations up to 5,000 startds, even with large bursts No disruption of schedd operation on other host Collector duty cycle ~.35. Substantial headroom left. Switching to 7-slot startds would get us to ~35,000 slots, with marginal additional load.
  • 41. Condor Scaling 3 Overall results: • • • • • Ran ~5,000 nodes for several weeks Production simulation jobs. Stageout to BNL. Spent approximately $13K. Only $750 was for data transfer Moderate failure rate due to spot terminations. Actual spot price paid very close to baseline, e.g. still less than $0.01/hr for m1.small • No solid statistics on efficiency/cost yet, beyond a rough appearance of "competitive"
  • 42. Clean “Separation” of the StartD from a HTCondor pool
  • 43. • Spot Instance reclaimed by AWS due to increase in Spot Price – detect “shutdown” signal and make good use of the time until “unplugged” • On demand instances released by owner when replaced by a Spot Instance – bring computation(s) to a “safe” state and maximize return on investment
  • 44. Who else is using this approach?
  • 45.
  • 46. ESA Gaia Mission Overview • ESA’s Gaia is an ambitious mission to chart a three-dimensional map of the Milky Way Galaxy in order to reveal the composition, formation and evolution of our galaxy. • Gaia will repeatedly analyze and record the positions and magnitude of approximately one billion stars over the course of several years. • 1 billion stars x 80 observations x 10 readouts = ~1 x 10^12 samples. • 1ms processing time/sample = more than 30 years of processing
  • 47. Multiwavelength Atlas of the Galactic Plane • • • • • • Collaboration between AWS, Caltech/IPAC and USC/ISI All images are publicly accessible via direct download and VAO APIs 16 wavelength infrared atlas spanning 1µm to 70µm Datasets from GLIMPSE and MIPSGAL, 2MASS, MSX, WISE Spatial sampling of 1 arcsec with ±180° longitude and ±20° latitude Mosaics generated by Montage (http://montage.ipac.caltech.edu) running on HTCondor
  • 48.
  • 49. Please give us your feedback on this presentation BDT402 As a thank you, we will select prize winners daily for completed surveys!