SlideShare a Scribd company logo
1 of 25
Download to read offline
Vas Vasiliadis
vas@uchicago.edu
February 28, 2024
Reliable, Remote Computation at all Scales
Why do we need remote computing?
Workloads
• Interactive, real-time
workloads
• Machine learning training
and inference
• Components may best be
executed in different places
Users
• Diverse backgrounds
and expertise
• Different user interfaces
(e.g., notebooks)
Resources
• Hardware specialization
• Specialization leads to
distribution
Why do researchers “hate” remote computing?
• Remote computing is notoriously complicated
– Authentication
– Network connections
– Configuring/managing jobs
– Interacting with resources (waiting in queues, scaling nodes)
– Configuring execution environment
– Getting results back again
• Researchers need to overcome the same obstacles
every time they move to a new resource
3
How can we bridge this gap?
Borrow page from data management playbook
è“Fire-and-forget” computation
èUniform access interface
èFederated access control
èFast networks making compute resource “local”
How can we bridge this gap?
Move closer to researchers’ environments
èResearchers primarily work in high level languages
èFunctions are a natural unit of computation
Function-as-a-Service (FaaS)!
• Researchers work in familiar language, e.g., Python…
• …using familiar interfaces, e.g., Jupyter
FaaS as offered by cloud providers
6
• Single provider, single
location to submit and
manage tasks
• Homogenous execution
environment
• Transparent and elastic
execution (of even very
small tasks)
• Integrated with
cloud provider data
management
Cloud
Provider 1
Cloud
Provider 2
FaaS as an interface to the advanced computing
ecosystem
7
• Multiple provider,
multiple location, single
interface
• Homogenous execution
environment
• Transparent and elastic
execution
• Integrated with
institution-wide data
management
Globus Compute helps bridge the gap
Managed, federated
Function-as-a-Service for
reliably, scaleably and
securely executing functions
on remote endpoints from
laptops to supercomputers
The Globus Compute model
• Compute service — Highly available cloud-hosted
service for managed, fire-and-forget function execution
• Compute endpoint — Abstracts access to compute
resources, from edge device to supercomputer
• Compute SDK — Python interface for interacting with the
service, suing the familiar (to many) Globus look and feel
• Security — Leverages Globus Auth; Compute endpoints
are resource servers, authN and access via OAuth tokens
9
User interaction with Globus Compute
10
A B
You request a function be
executed on endpoints A and B
1
2 Globus Compute manages
the reliable and secure
execution on these endpoints
3
Globus Compute returns results or
stores them until requested
Compute
Service
You register your code
(as a Python function)
0
A compute
resource
Another
compute
resource
Globus Compute transforms any computing
resource into a function serving endpoint
• Python pip installable agent
• Elastic resource provisioning
from local, cluster, or cloud
system (via Parsl)
• Parallel execution using local
fork or via common schedulers
– Slurm, PBS, LSF, Cobalt, K8s
11
Compute
Service
Executing functions with Globus Compute
• Users invoke functions as tasks
– Register Python function
– Pass input arguments
– Select endpoint(s)
• Service stores tasks in the cloud
• Endpoints fetch waiting tasks (when
online), run tasks, and return results
• Results stored in the cloud and on
Globus storage endpoints
• Users retrieve results asynchronously
12
Compute
Service
Common Use Cases
13
Use case 1: fire-and-forget execution
Executing a bag of tasks, e.g., running simulations with
different parameters, executing ML inferences, on one
or more remote computers directly from your
environment, e.g., Jupyter on your laptop
• Fire-and-forget execution managed by Globus Compute;
tasks/results cached until endpoint/client are online
• Portability across different systems—optionally making use of
specialized hardware
• Elastic scaling to provision resources as needed (from HPC and
cloud systems)
ML-based drug screening
15
National Virtual Biotechnology Laboratory, arXiv:2006.02431
Use case 2: automated analysis of data
Constructing and running automated analysis
pipelines with data processing steps that need to be
executed in different locations
• Automatically process data as they are acquired (event- and
workflow-based)
• Integrate with data movement and other actions—human and
machine
• Execute functions across the computing continuum e.g., near
instrument, in data center, on specialized hardware
Enabling serial crystallography at scale
• Serially image chips with
000’s of embedded crystals
• Quality control first 1,000 to
report failures
• Analyze batches of images as
they are collected
• Report statistics and images
during experiment
• Return structure to scientist
Darren Sherrell, Gyorgy Babnigg, Andrzej Joachimiak
Globus
Flows
How remote computing enables this use case
Image
processing
Local workstation
Carbon!
Check
threshold
Data publication
Transfer
Transfer
raw files
Transfer
Move results
to repo
Compute
Analyze
images
Compute
Visualize
Compute
Gather
metadata
Share
Set access
controls
Compute
Launch QA
job
Search
Ingest to
index
Supercomputer
GPU Cluster
Globus enables “smart” instruments
19
aining of Deep Neural Networks
ources
7sec
19sec
5sec
31sec.
cycle time
Seems too easy? Let’s take a look…
• Run function on my laptop …test, debug
• Scale (a little) on a cloud VM (or K8s cluster)
• Scale (a bit more) on my campus cluster
• …run complete model on DOE supercomputer :--(
20
Common pattern of computing in a flow
Transfer raw
instrument
images
Run a compute
job to process
image files
Transfer Compute
Move processed
images to sharing
system/repository
Set access
controls for
sharing data
Share
Transfer
1 2 3 4
EC2
Instance
“Instrument”
EC2
Instance
Compute
Resource
Our environment Registered
Function
Monitor
script
transfer control
access
result files
0 trigger
flow run
transfer
raw files
1
invoke image
processing function 2
set
permissions
4
transfer
result files
3
Data
Endpoint
Data
Endpoint
Data
Endpoint
ALCF
Eagle
Compute
Endpoint
Globus Compute current state
• Service is running in production
– Tried and tested by 10MMs of invocations on 100Ks endpoints
• Globus Computer Agent supports single user
– Akin to Globus Connect Personal
• Endpoints visible via Globus web app
• Function registration and invocation via SDK
23
Compute service will evolve rapidly
• Multi-user compute endpoints
– Akin to Globus Connect Server
• Native integration with transfer for stage in and stage
out of data for compute tasks
– Can be done today via Globus Flows
• Expanding compute service interfaces in the webapp
for administrators and users
Support resources
• Globus documentation: docs.globus.org
• YouTube channel: youtube.com/GlobusOnline
• Helpdesk: support@globus.org
• Mailing Lists: globus.org/mailing-lists
• Customer engagement team (office hours)
• Professional services team (advisory, custom work)

More Related Content

Similar to Reliable, Remote Computation at All Scales

Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginnerscpallares
 
StarlingX - Project Onboarding
StarlingX - Project OnboardingStarlingX - Project Onboarding
StarlingX - Project OnboardingShuquan Huang
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systemsSri Prasanna
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 

Similar to Reliable, Remote Computation at All Scales (20)

Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
StarlingX - Project Onboarding
StarlingX - Project OnboardingStarlingX - Project Onboarding
StarlingX - Project Onboarding
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 

More from Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to GlobusGlobus
 

More from Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 

Recently uploaded

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Recently uploaded (20)

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 

Reliable, Remote Computation at All Scales

  • 1. Vas Vasiliadis vas@uchicago.edu February 28, 2024 Reliable, Remote Computation at all Scales
  • 2. Why do we need remote computing? Workloads • Interactive, real-time workloads • Machine learning training and inference • Components may best be executed in different places Users • Diverse backgrounds and expertise • Different user interfaces (e.g., notebooks) Resources • Hardware specialization • Specialization leads to distribution
  • 3. Why do researchers “hate” remote computing? • Remote computing is notoriously complicated – Authentication – Network connections – Configuring/managing jobs – Interacting with resources (waiting in queues, scaling nodes) – Configuring execution environment – Getting results back again • Researchers need to overcome the same obstacles every time they move to a new resource 3
  • 4. How can we bridge this gap? Borrow page from data management playbook è“Fire-and-forget” computation èUniform access interface èFederated access control èFast networks making compute resource “local”
  • 5. How can we bridge this gap? Move closer to researchers’ environments èResearchers primarily work in high level languages èFunctions are a natural unit of computation Function-as-a-Service (FaaS)! • Researchers work in familiar language, e.g., Python… • …using familiar interfaces, e.g., Jupyter
  • 6. FaaS as offered by cloud providers 6 • Single provider, single location to submit and manage tasks • Homogenous execution environment • Transparent and elastic execution (of even very small tasks) • Integrated with cloud provider data management Cloud Provider 1 Cloud Provider 2
  • 7. FaaS as an interface to the advanced computing ecosystem 7 • Multiple provider, multiple location, single interface • Homogenous execution environment • Transparent and elastic execution • Integrated with institution-wide data management
  • 8. Globus Compute helps bridge the gap Managed, federated Function-as-a-Service for reliably, scaleably and securely executing functions on remote endpoints from laptops to supercomputers
  • 9. The Globus Compute model • Compute service — Highly available cloud-hosted service for managed, fire-and-forget function execution • Compute endpoint — Abstracts access to compute resources, from edge device to supercomputer • Compute SDK — Python interface for interacting with the service, suing the familiar (to many) Globus look and feel • Security — Leverages Globus Auth; Compute endpoints are resource servers, authN and access via OAuth tokens 9
  • 10. User interaction with Globus Compute 10 A B You request a function be executed on endpoints A and B 1 2 Globus Compute manages the reliable and secure execution on these endpoints 3 Globus Compute returns results or stores them until requested Compute Service You register your code (as a Python function) 0 A compute resource Another compute resource
  • 11. Globus Compute transforms any computing resource into a function serving endpoint • Python pip installable agent • Elastic resource provisioning from local, cluster, or cloud system (via Parsl) • Parallel execution using local fork or via common schedulers – Slurm, PBS, LSF, Cobalt, K8s 11 Compute Service
  • 12. Executing functions with Globus Compute • Users invoke functions as tasks – Register Python function – Pass input arguments – Select endpoint(s) • Service stores tasks in the cloud • Endpoints fetch waiting tasks (when online), run tasks, and return results • Results stored in the cloud and on Globus storage endpoints • Users retrieve results asynchronously 12 Compute Service
  • 14. Use case 1: fire-and-forget execution Executing a bag of tasks, e.g., running simulations with different parameters, executing ML inferences, on one or more remote computers directly from your environment, e.g., Jupyter on your laptop • Fire-and-forget execution managed by Globus Compute; tasks/results cached until endpoint/client are online • Portability across different systems—optionally making use of specialized hardware • Elastic scaling to provision resources as needed (from HPC and cloud systems)
  • 15. ML-based drug screening 15 National Virtual Biotechnology Laboratory, arXiv:2006.02431
  • 16. Use case 2: automated analysis of data Constructing and running automated analysis pipelines with data processing steps that need to be executed in different locations • Automatically process data as they are acquired (event- and workflow-based) • Integrate with data movement and other actions—human and machine • Execute functions across the computing continuum e.g., near instrument, in data center, on specialized hardware
  • 17. Enabling serial crystallography at scale • Serially image chips with 000’s of embedded crystals • Quality control first 1,000 to report failures • Analyze batches of images as they are collected • Report statistics and images during experiment • Return structure to scientist Darren Sherrell, Gyorgy Babnigg, Andrzej Joachimiak
  • 18. Globus Flows How remote computing enables this use case Image processing Local workstation Carbon! Check threshold Data publication Transfer Transfer raw files Transfer Move results to repo Compute Analyze images Compute Visualize Compute Gather metadata Share Set access controls Compute Launch QA job Search Ingest to index Supercomputer GPU Cluster
  • 19. Globus enables “smart” instruments 19 aining of Deep Neural Networks ources 7sec 19sec 5sec 31sec. cycle time
  • 20. Seems too easy? Let’s take a look… • Run function on my laptop …test, debug • Scale (a little) on a cloud VM (or K8s cluster) • Scale (a bit more) on my campus cluster • …run complete model on DOE supercomputer :--( 20
  • 21. Common pattern of computing in a flow Transfer raw instrument images Run a compute job to process image files Transfer Compute Move processed images to sharing system/repository Set access controls for sharing data Share Transfer 1 2 3 4
  • 22. EC2 Instance “Instrument” EC2 Instance Compute Resource Our environment Registered Function Monitor script transfer control access result files 0 trigger flow run transfer raw files 1 invoke image processing function 2 set permissions 4 transfer result files 3 Data Endpoint Data Endpoint Data Endpoint ALCF Eagle Compute Endpoint
  • 23. Globus Compute current state • Service is running in production – Tried and tested by 10MMs of invocations on 100Ks endpoints • Globus Computer Agent supports single user – Akin to Globus Connect Personal • Endpoints visible via Globus web app • Function registration and invocation via SDK 23
  • 24. Compute service will evolve rapidly • Multi-user compute endpoints – Akin to Globus Connect Server • Native integration with transfer for stage in and stage out of data for compute tasks – Can be done today via Globus Flows • Expanding compute service interfaces in the webapp for administrators and users
  • 25. Support resources • Globus documentation: docs.globus.org • YouTube channel: youtube.com/GlobusOnline • Helpdesk: support@globus.org • Mailing Lists: globus.org/mailing-lists • Customer engagement team (office hours) • Professional services team (advisory, custom work)