High Throughput Computing, AWS and the God Particle:
Finding New Sub-Atomic Particles on the AWS Cloud
Jamie Kinney (Sr. M...
Jamie Kinney
First, Some Background
Amazon EC2 Instance Types
Standard (m1,m3)
Micro (t1)
High Memory (m2)
High CPU (c1)
Cluster Compute
Intel Nehalem (cc1.4xlarge)
Intel Sandy Bridge E5-2670 (cc2.8xlarge)
Sandy Bridge, NUMA, 240GB RAM (cr1.4x...
Multiple Purchase Models
Free Tier

On Demand

Reserved

Spot

Get started on
AWS with free
usage & no
commitment

Pay for...
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances
• Priced to deliver up to 92% discount off of On-Demand Instance
–

•
•
•
•

$2.40/hour vs. $0.2...
Miron Livny
Armed with 5σ significance delivered by more than 6K scientists
from the ATLAS and CMS experiments, the Director General o...
High Energy Physics has been
a perfect (and challenging!)
example of High Throughput
Computing – an endless
stream of inde...
In 1996 I introduced the distinction between High

Performance Computing (HPC) and High
Throughput Computing (HTC) in a se...
High Throughput Computing
is a 24-7-365 activity and
therefore requires
automation
FLOPY ≠ (60*60*24*7*52)*FLOPS
HTCondors
“The members of the Open Science Grid (OSG) are united by a
commitment to promote the adoption and to advance the state of...
OSG in numbers: 2M core hours and 1 PB per
day on 120 US sites. 60% of the core hours are
used by the LHC experiments (ATL...
Submit Locally and run
Globally
Here is my work and here the resources
(local cluster or money) that I bring to
table
HTCondor uses a two phase
matchmaking process to first
allocate a resource to a
requestor and then to select a
task to be ...
Match!
Wi
Wi
Wi
Wi

SchedD

I am S and
MM
am looking
for a
W3
resource

StartD
I am D and
I am willing
to offer you
a reso...
Since the HTCondor SchedD
can also submit (via grid CEs
or SSH) jobs to remote batch
systems we can do the
following -
Local

User Code/
DAGMan

HTCondor

MM

MM

HTCondor

Factory
Front End

SchedD
Grid CE

Grid CE

Grid CE

LSF

PBS

MM
HT...
The OSG GildeIn factory uses the
SchedD as a resource provisioning agent
on behalf of the (local) SchedD. It
decides when,...
Since the HTCondor SchedD
can also manage VMs on
remote clouds (e.g. AWS &
Spot), the OSG factory can
also do the followin...
Local

User Code/
DAGMan

HTCondor

MM

MM

Factory
Front End

SchedD
EC2

OpSt

Spot

VM

VM

VM

StartD

Remote

C-app

...
This (natural) potential of adding
AWS resources to the OSG triggered
the following exploratory efforts by
ATLAS (John Hov...
Benchmarked a variety of EC2
instance types with a standard
HEP benchmark (HepSpec06)
Machine

HS06

HS06
stddev

Cores

$/kHS06-hour
(spot)

HS06/core

$/kHS06-hour
(on-demand)

m1.medium

10

1.3

1

10

1....
Budget – 10k x 10HS06, one week
Quantity

Expected
Cost

200

$8.5k

1199

$12k

168TB

$13k

Instances:
cc2.8xlarge, us-w...
Two Trail Runs of Cmsprod
1. 3 cores for one month
2. 100 cores for one week
• Attached EC2 VMs to T2_US_Wisconsin
–
–
–
–

Output  Wisconsin SE
Cm...
Simple Purchasing Strategy
• Bid $0.03/core-hour
– price is typically about half that
– (Note: this bid does not include b...
Results: Cost
• Total cost: $0.035/T2-core-hour
– (for equivalent work done/hour in T2_US_Wisconsin)
– In terms of HS06: $...
Scalability (and stability)
Elastic Cluster: Components
Static HTCondor central manager
•
Standalone, used only for Cloud work
AutoPyFactory (APF) con...
Condor Scaling 1
RACF received a $50K grant from AWS: Great opportunity to test:
• Condor scaling to thousands of nodes ov...
Condor Scaling 2
Refined approach:
•
•
•
•
•

Tune OS limits: 1M open files, 65K max processes
Split schedd from (collecto...
Condor Scaling 3
Overall results:
•
•
•
•
•

Ran ~5,000 nodes for several weeks
Production simulation jobs. Stageout to BN...
Clean “Separation” of the
StartD from a HTCondor pool
• Spot Instance reclaimed by AWS due to
increase in Spot Price – detect “shutdown”
signal and make good use of the time un...
Who else is using this
approach?
ESA Gaia Mission Overview
•

ESA’s Gaia is an ambitious mission to chart a three-dimensional map of the
Milky Way Galaxy i...
Multiwavelength Atlas of the Galactic Plane

•
•
•
•
•
•

Collaboration between AWS, Caltech/IPAC and USC/ISI
All images a...
Please give us your feedback on this
presentation

BDT402
As a thank you, we will select prize
winners daily for completed...
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Upcoming SlideShare
Loading in …5
×

Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013

1,411 views

Published on

This session will describe how members of the US Large Hadron Collider (LHC) community have benchmarked the usage of Amazon Elastic Compute Cloud (Amazon EC2) resource to simulate events observed by experiments at the European Organization for Nuclear Research (CERN). Miron Livny from the University of Wisconsin-Madison who has been collaborating with the US-LHC community for more than a decade will detail the process for benchmarking high-throughput computing (HTC) applications running across multiple AWS regions using the open source HTCondor distributed computing software. The presentation will also outline the different ways that AWS and HTCondor can help meet the needs of compute intensive applications from other scientific disciplines.

Published in: Technology, Education
  • Be the first to comment

Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013

  1. 1. High Throughput Computing, AWS and the God Particle: Finding New Sub-Atomic Particles on the AWS Cloud Jamie Kinney (Sr. Manager Scientific Computing, AWS) Miron Livny (Professor of Computer Science, University of Wisconsin) November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Jamie Kinney
  3. 3. First, Some Background
  4. 4. Amazon EC2 Instance Types
  5. 5. Standard (m1,m3) Micro (t1) High Memory (m2) High CPU (c1)
  6. 6. Cluster Compute Intel Nehalem (cc1.4xlarge) Intel Sandy Bridge E5-2670 (cc2.8xlarge) Sandy Bridge, NUMA, 240GB RAM (cr1.4xlarge) NVIDIA GRID GPU a.k.a. “Kepler” (g2.2xlarge) 2TB of SSD 120,000 IOPS (hi1.4xlarge) 48 TB of ephemeral storage (hs1.8xlarge)
  7. 7. Multiple Purchase Models Free Tier On Demand Reserved Spot Get started on AWS with free usage & no commitment Pay for compute capacity by the hour with no longterm commitments Make a low, onetime payment and receive a significant discount on the hourly charge Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand Launch instances within Amazon VPC that run on hardware dedicated to a single customer For POCs and getting started For spiky workloads, or to define needs For committed utilization For time-insensitive or transient workloads For highly sensitive or compliance related workloads Dedicated
  8. 8. Amazon EC2 Spot Instances
  9. 9. Amazon EC2 Spot Instances
  10. 10. Amazon EC2 Spot Instances • Priced to deliver up to 92% discount off of On-Demand Instance – • • • • $2.40/hour vs. $0.253/hour* for cc2.8xlarge in us-west-2 Elastic Potential to get capacity not otherwise available Minimum Commitment (1 hour) Caveat - potential for interruption * as of November 8th
  11. 11. Miron Livny
  12. 12. Armed with 5σ significance delivered by more than 6K scientists from the ATLAS and CMS experiments, the Director General of CERN, Rolf Heuer, asked on July 4, 2012: “I think we have it, do you agree?” “We have now found the missing cornerstone of particle physics. We have a discovery. We have observed a new particle that is consistent with a Higgs boson.” “only possible because of the extraordinary performance of the accelerators, experiments and the computing grid.”
  13. 13. High Energy Physics has been a perfect (and challenging!) example of High Throughput Computing – an endless stream of independent but interrelated jobs
  14. 14. In 1996 I introduced the distinction between High Performance Computing (HPC) and High Throughput Computing (HTC) in a seminar at the NASA Goddard Flight Center in and a month later at the European Laboratory for Particle Physics (CERN).
  15. 15. High Throughput Computing is a 24-7-365 activity and therefore requires automation FLOPY ≠ (60*60*24*7*52)*FLOPS
  16. 16. HTCondors
  17. 17. “The members of the Open Science Grid (OSG) are united by a commitment to promote the adoption and to advance the state of the art of distributed high throughput computing (DHTC) – shared utilization of autonomous resources where all the elements are optimized for maximizing computational throughput.”
  18. 18. OSG in numbers: 2M core hours and 1 PB per day on 120 US sites. 60% of the core hours are used by the LHC experiments (ATLAS & CMS)
  19. 19. Submit Locally and run Globally Here is my work and here the resources (local cluster or money) that I bring to table
  20. 20. HTCondor uses a two phase matchmaking process to first allocate a resource to a requestor and then to select a task to be delegated to the resource
  21. 21. Match! Wi Wi Wi Wi SchedD I am S and MM am looking for a W3 resource StartD I am D and I am willing to offer you a resource
  22. 22. Since the HTCondor SchedD can also submit (via grid CEs or SSH) jobs to remote batch systems we can do the following -
  23. 23. Local User Code/ DAGMan HTCondor MM MM HTCondor Factory Front End SchedD Grid CE Grid CE Grid CE LSF PBS MM HTCondor G-app StartD Remote C-app G-app StartD G-app StartD C-app C-app C-app MM HTCondor C-app OSG Factory SchedD
  24. 24. The OSG GildeIn factory uses the SchedD as a resource provisioning agent on behalf of the (local) SchedD. It decides when, from where and for how long to keep an acquired resource.
  25. 25. Since the HTCondor SchedD can also manage VMs on remote clouds (e.g. AWS & Spot), the OSG factory can also do the following -
  26. 26. Local User Code/ DAGMan HTCondor MM MM Factory Front End SchedD EC2 OpSt Spot VM VM VM StartD Remote C-app HTCondor StartD StartD C-app C-app C-app MM HTCondor C-app OSG Cloud Factory SchedD
  27. 27. This (natural) potential of adding AWS resources to the OSG triggered the following exploratory efforts by ATLAS (John Hover from BNL ) and CMS (Dan Bradley from UWMadison)
  28. 28. Benchmarked a variety of EC2 instance types with a standard HEP benchmark (HepSpec06)
  29. 29. Machine HS06 HS06 stddev Cores $/kHS06-hour (spot) HS06/core $/kHS06-hour (on-demand) m1.medium 10 1.3 1 10 1.3 13 m1.large 20 2.4 2 10 1.3 13 m1.xlarge 39 7.0 4 10 1.3 14 m2.xlarge 28 1.1 2 14 1.3 16 m2.2xlarge 55 0.4 4 14 1.3 17 m2.4xlarge 98 2.5 8 12 1.4 18 m3.xlarge 48 0.7 4 12 1.2 12 m3.2xlarge 91 1.8 8 11 1.3 13 cc1.4xlarge 139 0.3 16 9 1.5 9.3 cc2.8xlarge 285 8.1 32 9 1.0 8.4 Prices and benchmarks in us-east-1 zone (N. Virginia), Nov 2012. Dan Bradley 35
  30. 30. Budget – 10k x 10HS06, one week Quantity Expected Cost 200 $8.5k 1199 $12k 168TB $13k Instances: cc2.8xlarge, us-west-2 m3.xlarge, us-east-1 Output Transfer Total $33k Assumptions: • Need 15% extra time due to instance termination • Transfer out 0.01GB/HS06-hour (no direct connect) Dan Bradley $2.0/kHS06-hour 36
  31. 31. Two Trail Runs of Cmsprod
  32. 32. 1. 3 cores for one month 2. 100 cores for one week • Attached EC2 VMs to T2_US_Wisconsin – – – – Output  Wisconsin SE Cmsprod Glideins  Wisconsin CE Cmssoft  cvmfs Frontier and cvmfs caches  Wisconsin squids (2) Dan Bradley 38
  33. 33. Simple Purchasing Strategy • Bid $0.03/core-hour – price is typically about half that – (Note: this bid does not include bandwidth cost) • Used mix of m1.medium and m1.large instances – m1.medium: 1 core, 3.75GB RAM – m1.large: 2 cores, 7.5GB RAM • Used us-east-1 region (N. Virginia) – No preference for zone within region (there are 3) Dan Bradley 39
  34. 34. Results: Cost • Total cost: $0.035/T2-core-hour – (for equivalent work done/hour in T2_US_Wisconsin) – In terms of HS06: $2.6/kHS06-hour • 55% of cost was for the machine – Price: $0.0131/core-hour • 45% of cost was for data transfer – Price: $0.12/GB out (input is currently free) – jobs produced 0.1GB/hour • (this likely included merge jobs – not smart to run them in cloud!) – At higher volumes, price/GB is lower • e.g. at 100TB/month, price is $0.07/GB Dan Bradley 40
  35. 35. Scalability (and stability)
  36. 36. Elastic Cluster: Components Static HTCondor central manager • Standalone, used only for Cloud work AutoPyFactory (APF) configured with two queues • One observes a Panda queue, when jobs are activated, submits pilots to local cluster Condor queue. • Another observes the local Condor pool. When jobs are Idle, submits WN VMs to IaaS (up to some limit). When WNs are Unclaimed, shuts them down. Worker Node VMs • Generic Condor startds associated connect back to local Condor cluster. All VMs are identical, don’t need public IPs, and don't need to know about each other. • CVMFS software access Panda Site: • Associated with static BNL SE, LFC, etc.
  37. 37. Condor Scaling 1 RACF received a $50K grant from AWS: Great opportunity to test: • Condor scaling to thousands of nodes over WAN • Empirically determine costs Naïve Approach: • • • • Single Condor host (schedd, collector, etc.) Single process for each daemon Password authentication Condor Connection Broker (CCB) Result: Maxed out at ~3,000 nodes • • • • Collector load causing timeouts of schedd daemon CCB overload? Network connections exceeding open file limits Collector duty cycle -> .99
  38. 38. Condor Scaling 2 Refined approach: • • • • • Tune OS limits: 1M open files, 65K max processes Split schedd from (collector, negotiator, CCB) Run 20 collector processes. Startds randomly choose one. Enable collector reporting sub-collectors report to non-public collector Enable shared port daemon on all nodes: multiplexes TCP connections. Results in dozens of connections rather than thousands. Enable session auth, so that connections after the first bypass password auth check. Result: • • • Smooth operations up to 5,000 startds, even with large bursts No disruption of schedd operation on other host Collector duty cycle ~.35. Substantial headroom left. Switching to 7-slot startds would get us to ~35,000 slots, with marginal additional load.
  39. 39. Condor Scaling 3 Overall results: • • • • • Ran ~5,000 nodes for several weeks Production simulation jobs. Stageout to BNL. Spent approximately $13K. Only $750 was for data transfer Moderate failure rate due to spot terminations. Actual spot price paid very close to baseline, e.g. still less than $0.01/hr for m1.small • No solid statistics on efficiency/cost yet, beyond a rough appearance of "competitive"
  40. 40. Clean “Separation” of the StartD from a HTCondor pool
  41. 41. • Spot Instance reclaimed by AWS due to increase in Spot Price – detect “shutdown” signal and make good use of the time until “unplugged” • On demand instances released by owner when replaced by a Spot Instance – bring computation(s) to a “safe” state and maximize return on investment
  42. 42. Who else is using this approach?
  43. 43. ESA Gaia Mission Overview • ESA’s Gaia is an ambitious mission to chart a three-dimensional map of the Milky Way Galaxy in order to reveal the composition, formation and evolution of our galaxy. • Gaia will repeatedly analyze and record the positions and magnitude of approximately one billion stars over the course of several years. • 1 billion stars x 80 observations x 10 readouts = ~1 x 10^12 samples. • 1ms processing time/sample = more than 30 years of processing
  44. 44. Multiwavelength Atlas of the Galactic Plane • • • • • • Collaboration between AWS, Caltech/IPAC and USC/ISI All images are publicly accessible via direct download and VAO APIs 16 wavelength infrared atlas spanning 1µm to 70µm Datasets from GLIMPSE and MIPSGAL, 2MASS, MSX, WISE Spatial sampling of 1 arcsec with ±180° longitude and ±20° latitude Mosaics generated by Montage (http://montage.ipac.caltech.edu) running on HTCondor
  45. 45. Please give us your feedback on this presentation BDT402 As a thank you, we will select prize winners daily for completed surveys!

×