SlideShare a Scribd company logo
1 of 13
On-demand Computing for Scientific
Workflows using Amazon Web Services
Claudio Pontili Supervisors:
Gabriele Garzoglio
Steven Timm
Grid and Cloud Services Department, Fermilab
Presentation Outline
• Introduction: why do we want to use Commercial Cloud?
• GlideinWMS: a simple way to access resources
• Running up to 1000 simultaneously jobs on Amazon Web
Services and FermiCloud
• Task A: Moving software and data to Commercial Cloud
• Task B: Adding and removing HTCondor to GlideinWMS
using AWS
• Task C: Hybrid Cloud solution AWS+Fermicloud
• Conclusions
11/12/2014Claudio Pontili | AWS On-Demand2
Introduction: Fermilab Experiment Schedule
• Measurements at all
frontiers – Electroweak
physics, neutrino
oscillations, muon g-2,
dark energy, dark matter
• 8 major experiments in 3
frontiers running
simultaneously in 2016
• Sharing both beam and
computing resources
• Impressive breadth of
experiments at FNAL
3 11/12/2014Claudio Pontili | AWS On-Demand
FY16
Introduction: Computing requirements for
experiments
• higher intensity and higher precision
measurements are driving request
for more computing resources than
previous “small” experiments
• beam simulations to optimize
experiments - make every particle
count
• detector design studies - cost
effectiveness and sensitivity
projections
• higher bandwidth DAQ and greater
detector granularity
• event generation and detector
response simulation
• reconstruction and analysis
algorithms
4 11/12/2014Claudio Pontili | AWS On-Demand
Introduction: Slots Fermilab, OSG, & Clouds
• Current full capacity of FermiGrid  ~30k slots
• Full capacity of OSG (Open Science Grid)  ~85k slots
• Additional OSG opportunistic slots  15k – 30k
• Additional per-pay slots at commercial Clouds
11/12/2014Claudio Pontili | AWS On-Demand
5
Federation via GlideinWMS – Grid and Cloud Bursting
11/12/2014Claudio Pontili | AWS On-Demand6
Fermi
Cloud
Job Queue
FrontEnd
GlideinWMS
Pilot Factory
Gcloud
KISTI
Amazon
AWS
Jobsub_submit
FermiGrid
OSG
Unified submit tool for grid and cloud
using HTCondor
Users
Pilots
Jobs
Running AWS NovA Jobs as function of time, Oct 23. 2014
11/12/2014Claudio Pontili | AWS On-Demand7
0
200
400
600
800
1000
1200 2.21
2.28
2.35
2.42
2.49
2.56
3.03
3.10
3.17
3.24
3.31
3.38
3.45
3.52
3.59
4.06
4.13
4.20
4.27
4.34
4.41
4.48
4.55
5.02
5.09
5.16
5.23
5.30
5.37
5.44
5.51
5.58
6.05
6.12
6.19
6.26
6.33
6.40
6.47
6.54
7.01
7.08
7.15
7.22
7.29
7.36
7.43
7.50
Jobs
Time
• 3300 jobs
• 525 m3.large
• Total cost
$449
• Prices are
getting down
1 hour
Task A: Moving software and data to commercial cloud
using Scalable Squid Servers
• Need to transport software and data to the Cloud
• Auto-scalable squid servers, deploying and destroying in 30
seconds using CloudFormation script
11/12/2014Claudio Pontili | AWS On-Demand8
Task B: Auto-Scaling GlideinWMS adding/removing
HTCondor using Amazon Web Services
• New resources are made available through the WMS
(HTCondor)
• The system is designed to scale by adding servers
11/12/2014Claudio Pontili | AWS On-Demand9
• Problem: the
submission system is
a stateful service
– Easy to scale up
– Hard to scale down
• Solution? Manage
lifecycle of each
server using AWS
Hooks and Standby
(released July 30th
2014)
Pending
Pending:Wait
(Lifecycle hook)
InService
Standby
Terminating:Wait
(Lifecycle Hook)
Terminated
State diagram
• At least 7 different Amazon services
• 2 different programming languages (Java and bash scripting)
• Role based authentication: auto-generating and auto-rotating
logins and passwords
Task B: Auto-Scaling HTCondor using AWS - 2
11/12/2014Claudio Pontili | AWS On-Demand10
Custom
Metrics
(Idle and
running
jobs)
AWS
CLI
S3 Logs and Init
Scripts
Auto Scaling Group and
ELB controlled by Custom
Metrics
Scale Up
Event
Instance Launched
Lifecycle Hook
Hook
queueSNS
Custom Action:
looking for standby
instance instead of
creating a new one
Java
Instance attached to Auto
Scaling Group and ELB
Scale
Down
Event
Instance Removed from
the Auto Scaling Group
and ELB
Standby Instance (it ll be
terminated after 5 days)
Role Based
Authentication
Permissions
Queue
SNS
Custom Action:
finding the right VM
to terminate and
changing ASG state to
standby
Java
• No change for the final user!!! He just wants to deploy his job
in the same way and to computed it as soon as possible
• Now we can handle spikes of traffic using commercial cloud
• We pay AWS only during spikes
Task C: Hybrid cloud – Fermicloud and Amazon Web
Service
11/12/2014Claudio Pontili | AWS On-Demand11
Conclusions
• Experiments have an increased need for computing
resources with an increased diversity of requirements
• Managing these needs (especially peak demand) is a major
focus
• Experiments are being enabled to use a diverse set of
resources: Local, Grid, and Cloud
• Need to demonstrate sustainability and cost effectiveness of
the commercial Cloud solution for physics use cases (e.g.
NOvA MonteCarlo)
• This work demonstrated the scaling of on-demand services in
support of scientific workflows using native Amazon Services
11/12/2014Claudio Pontili | AWS On-Demand12
Thank you
• Questions?
11/12/2014Claudio Pontili | AWS On-Demand13

More Related Content

What's hot

Top 10 Cloud Trends for 2018 and Actions You Can Take Now
Top 10 Cloud Trends for 2018 and Actions You Can Take NowTop 10 Cloud Trends for 2018 and Actions You Can Take Now
Top 10 Cloud Trends for 2018 and Actions You Can Take NowRightScale
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoTBill Liu
 
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...Amazon Web Services
 
AWS Sydney Summit 2013 - Understanding your AWS Storage Options
AWS Sydney Summit 2013 - Understanding your AWS Storage OptionsAWS Sydney Summit 2013 - Understanding your AWS Storage Options
AWS Sydney Summit 2013 - Understanding your AWS Storage OptionsAmazon Web Services
 
Improve Page Render Time with Amazon Cloudfront
Improve Page Render Time with Amazon CloudfrontImprove Page Render Time with Amazon Cloudfront
Improve Page Render Time with Amazon CloudfrontPolyvore
 
Comparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutionsComparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutionsbasit raza
 
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
Save 60% of Kubernetes storage costs on AWS & others with OpenEBSSave 60% of Kubernetes storage costs on AWS & others with OpenEBS
Save 60% of Kubernetes storage costs on AWS & others with OpenEBSMayaData Inc
 
Workshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSWorkshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSAmazon Web Services
 
From AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture StoryFrom AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture StoryYen-Wen Chen
 
9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage Costs9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage CostsRightScale
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for BioinformaticsLynn Langit
 
Got a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP HelpsGot a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP HelpsRightScale
 
From Docker Straight to AWS
From Docker Straight to AWSFrom Docker Straight to AWS
From Docker Straight to AWSDevOps.com
 
IronSource Atom - Redshift - Lessons Learned
IronSource Atom -  Redshift - Lessons LearnedIronSource Atom -  Redshift - Lessons Learned
IronSource Atom - Redshift - Lessons LearnedIdan Tohami
 
Webinar: Using Litmus Chaos Engineering and AI for auto incident detection
Webinar: Using Litmus Chaos Engineering and AI for auto incident detectionWebinar: Using Litmus Chaos Engineering and AI for auto incident detection
Webinar: Using Litmus Chaos Engineering and AI for auto incident detectionMayaData Inc
 
Building big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerBuilding big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerIdan Tohami
 
Killing technical debt and reducing costs with Docker
Killing technical debt and reducing costs with DockerKilling technical debt and reducing costs with Docker
Killing technical debt and reducing costs with DockerCatalin Jora
 
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWSCoburn Watson
 

What's hot (20)

Top 10 Cloud Trends for 2018 and Actions You Can Take Now
Top 10 Cloud Trends for 2018 and Actions You Can Take NowTop 10 Cloud Trends for 2018 and Actions You Can Take Now
Top 10 Cloud Trends for 2018 and Actions You Can Take Now
 
Machine learning in the physical world by Kip Larson from AWS IoT
Machine learning in the physical world by  Kip Larson from AWS IoTMachine learning in the physical world by  Kip Larson from AWS IoT
Machine learning in the physical world by Kip Larson from AWS IoT
 
Aws ri vs cost saving plan
Aws ri vs cost saving planAws ri vs cost saving plan
Aws ri vs cost saving plan
 
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
 
AWS Sydney Summit 2013 - Understanding your AWS Storage Options
AWS Sydney Summit 2013 - Understanding your AWS Storage OptionsAWS Sydney Summit 2013 - Understanding your AWS Storage Options
AWS Sydney Summit 2013 - Understanding your AWS Storage Options
 
Improve Page Render Time with Amazon Cloudfront
Improve Page Render Time with Amazon CloudfrontImprove Page Render Time with Amazon Cloudfront
Improve Page Render Time with Amazon Cloudfront
 
Comparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutionsComparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutions
 
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
Save 60% of Kubernetes storage costs on AWS & others with OpenEBSSave 60% of Kubernetes storage costs on AWS & others with OpenEBS
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
 
Workshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSWorkshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECS
 
From AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture StoryFrom AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture Story
 
9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage Costs9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage Costs
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Got a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP HelpsGot a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP Helps
 
From Docker Straight to AWS
From Docker Straight to AWSFrom Docker Straight to AWS
From Docker Straight to AWS
 
IronSource Atom - Redshift - Lessons Learned
IronSource Atom -  Redshift - Lessons LearnedIronSource Atom -  Redshift - Lessons Learned
IronSource Atom - Redshift - Lessons Learned
 
Webinar: Using Litmus Chaos Engineering and AI for auto incident detection
Webinar: Using Litmus Chaos Engineering and AI for auto incident detectionWebinar: Using Litmus Chaos Engineering and AI for auto incident detection
Webinar: Using Litmus Chaos Engineering and AI for auto incident detection
 
Building big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerBuilding big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran Tessler
 
Killing technical debt and reducing costs with Docker
Killing technical debt and reducing costs with DockerKilling technical debt and reducing costs with Docker
Killing technical debt and reducing costs with Docker
 
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWS
 

Viewers also liked

Cognosante: MITA 3.0 SS-A Methodology Demonstration
Cognosante: MITA 3.0 SS-A Methodology DemonstrationCognosante: MITA 3.0 SS-A Methodology Demonstration
Cognosante: MITA 3.0 SS-A Methodology DemonstrationCognosante
 
Health Information Analytics: Data Governance, Data Quality and Data Standards
Health Information Analytics:  Data Governance, Data Quality and Data StandardsHealth Information Analytics:  Data Governance, Data Quality and Data Standards
Health Information Analytics: Data Governance, Data Quality and Data StandardsFrank Wang
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
 
MITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationMITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationREMilk
 

Viewers also liked (6)

Cognosante: MITA 3.0 SS-A Methodology Demonstration
Cognosante: MITA 3.0 SS-A Methodology DemonstrationCognosante: MITA 3.0 SS-A Methodology Demonstration
Cognosante: MITA 3.0 SS-A Methodology Demonstration
 
HPC in AWS - Technical Workshop
HPC in AWS - Technical WorkshopHPC in AWS - Technical Workshop
HPC in AWS - Technical Workshop
 
Health Information Analytics: Data Governance, Data Quality and Data Standards
Health Information Analytics:  Data Governance, Data Quality and Data StandardsHealth Information Analytics:  Data Governance, Data Quality and Data Standards
Health Information Analytics: Data Governance, Data Quality and Data Standards
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
MITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationMITA Beyond MMIS Presentation
MITA Beyond MMIS Presentation
 

Similar to Fermilab aws on demand

Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GooglePatrick Pierson
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...Vadym Kazulkin
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Amazon Web Services
 
Training AWS: Module 9 - CloudWatch
Training AWS: Module 9 - CloudWatchTraining AWS: Module 9 - CloudWatch
Training AWS: Module 9 - CloudWatchBùi Quang Lâm
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAmazon Web Services
 
From AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesFrom AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesAmazon Web Services
 
Aws Architecture Fundamentals
Aws Architecture FundamentalsAws Architecture Fundamentals
Aws Architecture Fundamentals2nd Watch
 
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAmazon Web Services
 
Consul connect
Consul connectConsul connect
Consul connectjabizz
 
Create Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the CloudCreate Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the CloudRightScale
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityClaudio Pontili
 
What is Cloud Computing with AWS?
What is Cloud Computing with AWS?What is Cloud Computing with AWS?
What is Cloud Computing with AWS?Amazon Web Services
 
AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진
AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진
AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진Amazon Web Services Korea
 
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Amazon Web Services
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016AwsReinventSlides
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 

Similar to Fermilab aws on demand (20)

Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
 
Training AWS: Module 9 - CloudWatch
Training AWS: Module 9 - CloudWatchTraining AWS: Module 9 - CloudWatch
Training AWS: Module 9 - CloudWatch
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote Slides
 
From AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy PiecesFrom AWS to Series A in 5 Easy Pieces
From AWS to Series A in 5 Easy Pieces
 
Aws Architecture Fundamentals
Aws Architecture FundamentalsAws Architecture Fundamentals
Aws Architecture Fundamentals
 
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
 
Consul connect
Consul connectConsul connect
Consul connect
 
Create Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the CloudCreate Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the Cloud
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud Facility
 
Consul connect
Consul connectConsul connect
Consul connect
 
Cloud computing benefits
Cloud computing benefitsCloud computing benefits
Cloud computing benefits
 
What is Cloud Computing with AWS?
What is Cloud Computing with AWS?What is Cloud Computing with AWS?
What is Cloud Computing with AWS?
 
AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진
AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진
AWS Enterprise Summit - 클라우드 네이티브 신규 애플리케이션 구축하기 - 정윤진
 
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 

More from Claudio Pontili

beSharp a serverless approach to big data on aws
beSharp a serverless approach to big data on awsbeSharp a serverless approach to big data on aws
beSharp a serverless approach to big data on awsClaudio Pontili
 
Infrastructure as code for enterprises
Infrastructure as code for enterprisesInfrastructure as code for enterprises
Infrastructure as code for enterprisesClaudio Pontili
 
(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverlessClaudio Pontili
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutionsClaudio Pontili
 
Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...
Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...
Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...Claudio Pontili
 
Seminario Cloud computing Ordine di latina - L'offerta di Amazon Web Services
Seminario Cloud computing Ordine di latina - L'offerta di Amazon Web ServicesSeminario Cloud computing Ordine di latina - L'offerta di Amazon Web Services
Seminario Cloud computing Ordine di latina - L'offerta di Amazon Web ServicesClaudio Pontili
 
Seminario Cloud computing Ordine di latina - cloud computing
Seminario Cloud computing Ordine di latina - cloud computingSeminario Cloud computing Ordine di latina - cloud computing
Seminario Cloud computing Ordine di latina - cloud computingClaudio Pontili
 
GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...
GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...
GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...Claudio Pontili
 

More from Claudio Pontili (8)

beSharp a serverless approach to big data on aws
beSharp a serverless approach to big data on awsbeSharp a serverless approach to big data on aws
beSharp a serverless approach to big data on aws
 
Infrastructure as code for enterprises
Infrastructure as code for enterprisesInfrastructure as code for enterprises
Infrastructure as code for enterprises
 
(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...
Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...
Seminario Cloud computing Ordine di latina - Caso d'uso realizzazione sito wo...
 
Seminario Cloud computing Ordine di latina - L'offerta di Amazon Web Services
Seminario Cloud computing Ordine di latina - L'offerta di Amazon Web ServicesSeminario Cloud computing Ordine di latina - L'offerta di Amazon Web Services
Seminario Cloud computing Ordine di latina - L'offerta di Amazon Web Services
 
Seminario Cloud computing Ordine di latina - cloud computing
Seminario Cloud computing Ordine di latina - cloud computingSeminario Cloud computing Ordine di latina - cloud computing
Seminario Cloud computing Ordine di latina - cloud computing
 
GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...
GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...
GARANTE DELLA PRIVACY - Cloud computing proteggere i dati per non cadere dall...
 

Recently uploaded

What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 

Recently uploaded (20)

What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 

Fermilab aws on demand

  • 1. On-demand Computing for Scientific Workflows using Amazon Web Services Claudio Pontili Supervisors: Gabriele Garzoglio Steven Timm Grid and Cloud Services Department, Fermilab
  • 2. Presentation Outline • Introduction: why do we want to use Commercial Cloud? • GlideinWMS: a simple way to access resources • Running up to 1000 simultaneously jobs on Amazon Web Services and FermiCloud • Task A: Moving software and data to Commercial Cloud • Task B: Adding and removing HTCondor to GlideinWMS using AWS • Task C: Hybrid Cloud solution AWS+Fermicloud • Conclusions 11/12/2014Claudio Pontili | AWS On-Demand2
  • 3. Introduction: Fermilab Experiment Schedule • Measurements at all frontiers – Electroweak physics, neutrino oscillations, muon g-2, dark energy, dark matter • 8 major experiments in 3 frontiers running simultaneously in 2016 • Sharing both beam and computing resources • Impressive breadth of experiments at FNAL 3 11/12/2014Claudio Pontili | AWS On-Demand FY16
  • 4. Introduction: Computing requirements for experiments • higher intensity and higher precision measurements are driving request for more computing resources than previous “small” experiments • beam simulations to optimize experiments - make every particle count • detector design studies - cost effectiveness and sensitivity projections • higher bandwidth DAQ and greater detector granularity • event generation and detector response simulation • reconstruction and analysis algorithms 4 11/12/2014Claudio Pontili | AWS On-Demand
  • 5. Introduction: Slots Fermilab, OSG, & Clouds • Current full capacity of FermiGrid  ~30k slots • Full capacity of OSG (Open Science Grid)  ~85k slots • Additional OSG opportunistic slots  15k – 30k • Additional per-pay slots at commercial Clouds 11/12/2014Claudio Pontili | AWS On-Demand 5
  • 6. Federation via GlideinWMS – Grid and Cloud Bursting 11/12/2014Claudio Pontili | AWS On-Demand6 Fermi Cloud Job Queue FrontEnd GlideinWMS Pilot Factory Gcloud KISTI Amazon AWS Jobsub_submit FermiGrid OSG Unified submit tool for grid and cloud using HTCondor Users Pilots Jobs
  • 7. Running AWS NovA Jobs as function of time, Oct 23. 2014 11/12/2014Claudio Pontili | AWS On-Demand7 0 200 400 600 800 1000 1200 2.21 2.28 2.35 2.42 2.49 2.56 3.03 3.10 3.17 3.24 3.31 3.38 3.45 3.52 3.59 4.06 4.13 4.20 4.27 4.34 4.41 4.48 4.55 5.02 5.09 5.16 5.23 5.30 5.37 5.44 5.51 5.58 6.05 6.12 6.19 6.26 6.33 6.40 6.47 6.54 7.01 7.08 7.15 7.22 7.29 7.36 7.43 7.50 Jobs Time • 3300 jobs • 525 m3.large • Total cost $449 • Prices are getting down 1 hour
  • 8. Task A: Moving software and data to commercial cloud using Scalable Squid Servers • Need to transport software and data to the Cloud • Auto-scalable squid servers, deploying and destroying in 30 seconds using CloudFormation script 11/12/2014Claudio Pontili | AWS On-Demand8
  • 9. Task B: Auto-Scaling GlideinWMS adding/removing HTCondor using Amazon Web Services • New resources are made available through the WMS (HTCondor) • The system is designed to scale by adding servers 11/12/2014Claudio Pontili | AWS On-Demand9 • Problem: the submission system is a stateful service – Easy to scale up – Hard to scale down • Solution? Manage lifecycle of each server using AWS Hooks and Standby (released July 30th 2014) Pending Pending:Wait (Lifecycle hook) InService Standby Terminating:Wait (Lifecycle Hook) Terminated State diagram
  • 10. • At least 7 different Amazon services • 2 different programming languages (Java and bash scripting) • Role based authentication: auto-generating and auto-rotating logins and passwords Task B: Auto-Scaling HTCondor using AWS - 2 11/12/2014Claudio Pontili | AWS On-Demand10 Custom Metrics (Idle and running jobs) AWS CLI S3 Logs and Init Scripts Auto Scaling Group and ELB controlled by Custom Metrics Scale Up Event Instance Launched Lifecycle Hook Hook queueSNS Custom Action: looking for standby instance instead of creating a new one Java Instance attached to Auto Scaling Group and ELB Scale Down Event Instance Removed from the Auto Scaling Group and ELB Standby Instance (it ll be terminated after 5 days) Role Based Authentication Permissions Queue SNS Custom Action: finding the right VM to terminate and changing ASG state to standby Java
  • 11. • No change for the final user!!! He just wants to deploy his job in the same way and to computed it as soon as possible • Now we can handle spikes of traffic using commercial cloud • We pay AWS only during spikes Task C: Hybrid cloud – Fermicloud and Amazon Web Service 11/12/2014Claudio Pontili | AWS On-Demand11
  • 12. Conclusions • Experiments have an increased need for computing resources with an increased diversity of requirements • Managing these needs (especially peak demand) is a major focus • Experiments are being enabled to use a diverse set of resources: Local, Grid, and Cloud • Need to demonstrate sustainability and cost effectiveness of the commercial Cloud solution for physics use cases (e.g. NOvA MonteCarlo) • This work demonstrated the scaling of on-demand services in support of scientific workflows using native Amazon Services 11/12/2014Claudio Pontili | AWS On-Demand12
  • 13. Thank you • Questions? 11/12/2014Claudio Pontili | AWS On-Demand13