State of the Globus World
Rachana Ananthakrishnan
Ian Foster
Globus: a hybrid model for research IT
2
Standards-compl
ant security
fabric
Compute
Facility
On-prem & cloud
storage
Laptop/desktop
Institutional
resources
Instrument
facility/Lab Laptop/desktop Custom services
Global
management &
orchestration
Hosted, persistent, scalable, resilient services
Local agents
with plug-in
Action
Provider
Globus
Compute
Globus
Connect
33M+
compute
tasks
241K+
flow
runs
60K+
connecte
d
storage
15K+
guest
collection
s
542K+
registered
users/app
s
192M+
document
s
indexed
New Globus users
4
Data ecosystem
• Web addressable data (HTTP/S
access)
• Secure, reliable and managed
transfer
• Collaborative data sharing, with
fine grained access control
• Consistent UX across diverse
storage systems
Data collections
Globus transfer users
6
7
Globus Connect Server deployments
8
Some of the updates for data ecosystem
• Mapped collections management
– update owners
– delete protected by default
• Facilitate collection lifecycle management
– Mapped collection administrators can modify guest collection metadata and delete roles
and permissions
– Time of last use of collection
– CLI tools to manage collections based on creation/last use time
• IPv6 only and dual stack support for data transfer and sharing
– Globus Transfer, Auth, and Groups services; GCS and GCP
• Expanded set of Linux distributions supported for Globus Connect Server
– 11 distros, and 26 versions
• ARM AArch64 supported for Globus Connect Personal
9
Continue to grow supported storage systems
Connector highlights
• OneDrive: update to use preferred metadata
checksums.
• HPSS: error reporting, and caching improvements
• Google Drive: skip duplicate file
• Google/Dropbox/Box/MS/: support use of any user
account without need to mapping
• Grow S3 compatible storage system partners: Storj
11
Protected data management
12
Protected data management updates
• Support use of Timers with protected data
• Expiration time on permission with guest collections
• Administrative controls on permission expiration
policy
13
Compliance
Increased contractual
requirements around
information security
and privacy.
14
We are hiring!
Governance, Risk and
Compliance Lead
Compute ecosystem • Programmatic access to
compute resources
• Reliable and managed
execution
• Consistent user interface
across diverse execution
systems
• Fine grained access control
Compute
endpoint
User interaction with Globus Compute
16
A B
You request a function be
executed on endpoints A and B
1
2 Globus Compute manages
the reliable and secure
execution on these endpoints
3
Globus Compute returns results or
stores them until requested
Compute
Service
A compute
resource
Another
compute
resource
Globus Compute Multiuser Endpoint
• Deployed and operated by administrators
– Launches processes as user’s local account
• Preconfigured templates for local site options and
policies
• Same AuthN & AuthZ as Globus Connect Server
– Domain based authentication policy
– Authorization via mapping of user to local account
17
Launches an
endpoint process
for user
Globus Compute Multiuser Endpoint
18
Globus Compute Multiuser
Endpoint
Identity
Mapping
Configuratio
n
Templates
User Endpoint Process
(as local user)
Globus Compute Engine
Launches an
endpoint process
for user
Node N
Node 2
Node 1
Compute
Service
User Endpoint Process
(as local user)
Globus Compute Engine
Globus Compute Multiuser Endpoint
19
Learn more at tutorial
tomorrow
Automation ecosystem
20
• Event-driven invocation of actions on
diverse services
• Reliable and managed orchestration
• Extensible to support custom service
APIs
• Delegated execution and monitoring
Flows
Flows highlights
• Better error handling for consents and authentication
• Discovery of flows and runs via search service
• Improvements to guided start of run
• In-depth flow validation prior to deploy
21
$ globus flows validate definition.json
Python SDK/CLI
• Globus Connect Server commands
– endpoint, collection management, guest collection creation
• Timer pause/resume
• Flows run management
docs.globus.org/cli/reference/changelog/
globus-sdk-python.readthedocs.io/en/stable/changelog.html
22
Javascript (JS) SDK
• Simplify integration with
web applications, JS
runtimes
• Support for all the services
in the Globus platform
• Globus web app
(app.globus.org) uses the
JS SDK
23
github.com/globus/globus-sdk-javascript
What are we seeing the community invest in?
• (Secure) Data distribution/publication
• FAIR data/ML ready data/…
• Migration across storage systems
• Dealing with data from instruments/experiment
• Applications discovering and using “for purpose”
resources
• Managed run of a compute campaign (a bag of tasks)
• Offering accessible user interfaces for complex
capabilities
24
Some of our focus areas…
25
• Harmonize terminology and model on data ecosystem
• Connector enhancements
• Increased limits for transfer, driven by automation
• MPI support for compute
• Web interfaces for compute task management for user and
admins
• Expanded policy support for search indices
• Additional services for use with protected data
• …
Lowering barriers for authoring flows
26
globus.github.io/flows-ide/
• Schema validation
• Visualization of
the flow definition
• Integration with
action provider
schema
• Leverage
validation tools
Building portal/science gateways/applications
27
Platform APIs
SDKs, CLI, Helper
pages
Globus Django Portal
Framework
Sample portal Globus Static
Portal Framework
Globus Static Portal Framework
• Single Page Application (SPA) portals
• No code solution
– Globus provided Generators for common use cases
– Customizable configuration of portal using JSON
• Served from any static content hosting solution
– E.g. GitHub Pages, AWS S3
• Pre-built continuous integration and deployment (CI/CD)
– Using GitHub Actions
28
1. Register the portal with Globus
29
Register an
application
with Globus
Auth, so the
portal has it’s
own identity.
2. Create new repository from template
30
Template repository
contains:
- Configuration
template for specific
use case
• Configuration of
GitHub Actions to use
generator and
automatically deploy
using GitHub Pages
• Dependabot
configuration to
manage
dependencies
Globus provided template repositories
3. Configure the project repository to use GitHub
Actions
31
Configure the
repository’s
Pages to be
deployed using
Action
4. Update configuration to customize
32
Configuration:
- Client id
- Portal features:
- Title
- Privacy Policy
- Terms of Services
- Tagline
- Data served:
- Collection id
- Path
5. Portal is automatically deployed
33
• Uses GitHub
Actions to
build
• Deployed
using GitHub
Pages
6. And kept updated
34
• Dependency
updates are
managed via
Dependabot
Sustaining
and
growing
Globus
35
Subscriber growth
36
Subscribers by Subscription Type
37
Self-managed subscription management
Subscription groups to
manage roles and privileges
39
Group policy and
membership managed
by the institution
Please update default
text in your subscription
group description!
Engage with the Globus team
40
Globus Discuss
community mailing list
Our Mission
Increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software
An automation ecosystem
43
• Event-driven invocation of actions on
diverse services
• Reliable and managed orchestration
• Extensible to support custom service
APIs
• Delegated execution and monitoring
Flows
The continued evolution of the scientific method
https://doi.org/10.1038/s41524-022-00765-z
• Scientific knowledge at scale
• AI-generated hypotheses
• Autonomous testing
1600s 1950s 2000s 2020s
Empirical
Science
1st
Paradigm
Theoretical
Science
2nd
Paradigm
Computational
Science
3rd Paradigm
Big Data-driven
Science
4th
Paradigm
Accelerated
Discovery
Observations
Experimentation
Scientific laws in
physics, biology,
chemistry, etc.
• Simulations
• Molecular dynamics
• Mechanistic models
• Big data, machine learning
• Patterns, anomalies
• Visualization
Increasing automation, connectivity, and scale
Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights
Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights
Access & integrate data, computing, instruments, services
Anywhere, any time; securely, reliably, rapidly, scalably
Science and
Engineering
Datasets
Mathematics
Biology
Materials
Chemistry
Particle Physics
Nuclear Physics
Computer Science
Climate
Medicine
Cosmology
Fusion Energy
Accelerators
Reactors
Energy Systems
Manufacturing
Downstream
Scientific Tasks
Autonomous
Experiments
Scientific
Discovery
Digital Twins
Inverse Design
Code Optimization
Accelerated
Simulations
Text and Code
Corpora
General Text
Media
News
Humanities
History
Law
Digital Libraries
OSTI Archive
Scientific Journals
arXiv
Code repositories
Data.gov
PubMed
Agency Archives
Open
Science
Foundation
Model
Training
Tuned
and
Adapted
Downstream
Models
Co-Design
AI: Open science foundation model(s)
• General purpose scientific LLM: Broadly trained,
on general corpora; scientific papers and texts;
structured science data
• Explore pathways towards a “Scientific Assistant”
• Built with international partners
• Multilingual: English,日本語, French, German,
Spanish, Italian, …
• Multimodal: Images, tables, equations, proofs,
time-series, graphs, fields, sequences, …
Trillion Parameter Consortium
A founding member
of:
AuroraGPT: A foundation model
for open science
Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights
AuroraGPT dFMs UX-LLM
Hybrid AI models
(Community of Experts
information flows)
Query PubMed for ChatGPT
feedstock
Accelerated discovery processes
For example: A peptide expert
(Prototyped with PubMed and ChatGPT)
Retrieve abstracts A from PubMed that
reference specified peptide
Use ChatGPT to build hypotheses by
using retrieval-augmented generation: e.g.:
“Given A, on which organism is {peptide}
acting?”
Arvind Ramanathan, Priyanka Setty, et al.
We want a model
with deep expertise
regarding peptides
and related topics
We want to be able
to make millions of
such requests
PMC Agent BC-BRC
Agent
Generate additional
experiments?
?
Set of
peptides as
input
Query PubMed for
ChatGPT feedstock
Align proteins, predict
structure, rank results
Evaluate structures
and filter results
UniProt
Agent
Peptide agent may be used with other
agents to identify antimicrobial peptides
Agents run on
HPC/AI resources
Self-driving lab performs experiments
Candidates for
experimental
evaluation
AARL-P
Rapid Prototyping Lab
Bldg 240
AARL-C
Polybot
CNM User
Facility
Building 440
AARL-X
APS Sector
8-ID
AARL-A
Airfree
Building
200
AARL-B
Biology
BSL-2
Bldg 350
Argonne Autonomous Research Laboratories (AARL)
Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights
• Access & integrate data, computing,
instruments, and services
• Anywhere, any time;
securely, reliably, rapidly, scalably

GlobusWorld 2024 Opening Keynote session

  • 1.
    State of theGlobus World Rachana Ananthakrishnan Ian Foster
  • 2.
    Globus: a hybridmodel for research IT 2 Standards-compl ant security fabric Compute Facility On-prem & cloud storage Laptop/desktop Institutional resources Instrument facility/Lab Laptop/desktop Custom services Global management & orchestration Hosted, persistent, scalable, resilient services Local agents with plug-in Action Provider Globus Compute Globus Connect
  • 3.
  • 4.
  • 5.
    Data ecosystem • Webaddressable data (HTTP/S access) • Secure, reliable and managed transfer • Collaborative data sharing, with fine grained access control • Consistent UX across diverse storage systems Data collections
  • 6.
  • 7.
  • 8.
    Globus Connect Serverdeployments 8
  • 9.
    Some of theupdates for data ecosystem • Mapped collections management – update owners – delete protected by default • Facilitate collection lifecycle management – Mapped collection administrators can modify guest collection metadata and delete roles and permissions – Time of last use of collection – CLI tools to manage collections based on creation/last use time • IPv6 only and dual stack support for data transfer and sharing – Globus Transfer, Auth, and Groups services; GCS and GCP • Expanded set of Linux distributions supported for Globus Connect Server – 11 distros, and 26 versions • ARM AArch64 supported for Globus Connect Personal 9
  • 10.
    Continue to growsupported storage systems
  • 11.
    Connector highlights • OneDrive:update to use preferred metadata checksums. • HPSS: error reporting, and caching improvements • Google Drive: skip duplicate file • Google/Dropbox/Box/MS/: support use of any user account without need to mapping • Grow S3 compatible storage system partners: Storj 11
  • 12.
  • 13.
    Protected data managementupdates • Support use of Timers with protected data • Expiration time on permission with guest collections • Administrative controls on permission expiration policy 13
  • 14.
    Compliance Increased contractual requirements around informationsecurity and privacy. 14 We are hiring! Governance, Risk and Compliance Lead
  • 15.
    Compute ecosystem •Programmatic access to compute resources • Reliable and managed execution • Consistent user interface across diverse execution systems • Fine grained access control Compute endpoint
  • 16.
    User interaction withGlobus Compute 16 A B You request a function be executed on endpoints A and B 1 2 Globus Compute manages the reliable and secure execution on these endpoints 3 Globus Compute returns results or stores them until requested Compute Service A compute resource Another compute resource
  • 17.
    Globus Compute MultiuserEndpoint • Deployed and operated by administrators – Launches processes as user’s local account • Preconfigured templates for local site options and policies • Same AuthN & AuthZ as Globus Connect Server – Domain based authentication policy – Authorization via mapping of user to local account 17 Launches an endpoint process for user
  • 18.
    Globus Compute MultiuserEndpoint 18 Globus Compute Multiuser Endpoint Identity Mapping Configuratio n Templates User Endpoint Process (as local user) Globus Compute Engine Launches an endpoint process for user Node N Node 2 Node 1 Compute Service User Endpoint Process (as local user) Globus Compute Engine
  • 19.
    Globus Compute MultiuserEndpoint 19 Learn more at tutorial tomorrow
  • 20.
    Automation ecosystem 20 • Event-driveninvocation of actions on diverse services • Reliable and managed orchestration • Extensible to support custom service APIs • Delegated execution and monitoring Flows
  • 21.
    Flows highlights • Bettererror handling for consents and authentication • Discovery of flows and runs via search service • Improvements to guided start of run • In-depth flow validation prior to deploy 21 $ globus flows validate definition.json
  • 22.
    Python SDK/CLI • GlobusConnect Server commands – endpoint, collection management, guest collection creation • Timer pause/resume • Flows run management docs.globus.org/cli/reference/changelog/ globus-sdk-python.readthedocs.io/en/stable/changelog.html 22
  • 23.
    Javascript (JS) SDK •Simplify integration with web applications, JS runtimes • Support for all the services in the Globus platform • Globus web app (app.globus.org) uses the JS SDK 23 github.com/globus/globus-sdk-javascript
  • 24.
    What are weseeing the community invest in? • (Secure) Data distribution/publication • FAIR data/ML ready data/… • Migration across storage systems • Dealing with data from instruments/experiment • Applications discovering and using “for purpose” resources • Managed run of a compute campaign (a bag of tasks) • Offering accessible user interfaces for complex capabilities 24
  • 25.
    Some of ourfocus areas… 25 • Harmonize terminology and model on data ecosystem • Connector enhancements • Increased limits for transfer, driven by automation • MPI support for compute • Web interfaces for compute task management for user and admins • Expanded policy support for search indices • Additional services for use with protected data • …
  • 26.
    Lowering barriers forauthoring flows 26 globus.github.io/flows-ide/ • Schema validation • Visualization of the flow definition • Integration with action provider schema • Leverage validation tools
  • 27.
    Building portal/science gateways/applications 27 PlatformAPIs SDKs, CLI, Helper pages Globus Django Portal Framework Sample portal Globus Static Portal Framework
  • 28.
    Globus Static PortalFramework • Single Page Application (SPA) portals • No code solution – Globus provided Generators for common use cases – Customizable configuration of portal using JSON • Served from any static content hosting solution – E.g. GitHub Pages, AWS S3 • Pre-built continuous integration and deployment (CI/CD) – Using GitHub Actions 28
  • 29.
    1. Register theportal with Globus 29 Register an application with Globus Auth, so the portal has it’s own identity.
  • 30.
    2. Create newrepository from template 30 Template repository contains: - Configuration template for specific use case • Configuration of GitHub Actions to use generator and automatically deploy using GitHub Pages • Dependabot configuration to manage dependencies Globus provided template repositories
  • 31.
    3. Configure theproject repository to use GitHub Actions 31 Configure the repository’s Pages to be deployed using Action
  • 32.
    4. Update configurationto customize 32 Configuration: - Client id - Portal features: - Title - Privacy Policy - Terms of Services - Tagline - Data served: - Collection id - Path
  • 33.
    5. Portal isautomatically deployed 33 • Uses GitHub Actions to build • Deployed using GitHub Pages
  • 34.
    6. And keptupdated 34 • Dependency updates are managed via Dependabot
  • 35.
  • 36.
  • 37.
  • 38.
    Self-managed subscription management Subscriptiongroups to manage roles and privileges
  • 39.
    39 Group policy and membershipmanaged by the institution Please update default text in your subscription group description!
  • 40.
    Engage with theGlobus team 40 Globus Discuss community mailing list
  • 42.
    Our Mission Increase theefficiency and effectiveness of researchers engaged in data-driven science and scholarship through sustainable software
  • 43.
    An automation ecosystem 43 •Event-driven invocation of actions on diverse services • Reliable and managed orchestration • Extensible to support custom service APIs • Delegated execution and monitoring Flows
  • 44.
    The continued evolutionof the scientific method https://doi.org/10.1038/s41524-022-00765-z • Scientific knowledge at scale • AI-generated hypotheses • Autonomous testing 1600s 1950s 2000s 2020s Empirical Science 1st Paradigm Theoretical Science 2nd Paradigm Computational Science 3rd Paradigm Big Data-driven Science 4th Paradigm Accelerated Discovery Observations Experimentation Scientific laws in physics, biology, chemistry, etc. • Simulations • Molecular dynamics • Mechanistic models • Big data, machine learning • Patterns, anomalies • Visualization Increasing automation, connectivity, and scale
  • 45.
    Accelerating discovery usingAI, HPC, and robotics Extraction, integration and reasoning with knowledge at scale Tools help identify new questions based on needs and gaps in knowledge Machine representation of knowledge leads to new hypotheses and questions Generative models automatically propose new hypotheses that expand the discovery space Robotic labs automate experimentation and bridge digital models and physical testing Accelerated Scientific Method https://doi.org/10.1038/s41524-022-00765-z Pattern and anomaly detection integrated with simulation and experiment to extract insights
  • 46.
    Accelerating discovery usingAI, HPC, and robotics Extraction, integration and reasoning with knowledge at scale Tools help identify new questions based on needs and gaps in knowledge Machine representation of knowledge leads to new hypotheses and questions Generative models automatically propose new hypotheses that expand the discovery space Robotic labs automate experimentation and bridge digital models and physical testing Accelerated Scientific Method https://doi.org/10.1038/s41524-022-00765-z Pattern and anomaly detection integrated with simulation and experiment to extract insights Access & integrate data, computing, instruments, services Anywhere, any time; securely, reliably, rapidly, scalably
  • 47.
    Science and Engineering Datasets Mathematics Biology Materials Chemistry Particle Physics NuclearPhysics Computer Science Climate Medicine Cosmology Fusion Energy Accelerators Reactors Energy Systems Manufacturing Downstream Scientific Tasks Autonomous Experiments Scientific Discovery Digital Twins Inverse Design Code Optimization Accelerated Simulations Text and Code Corpora General Text Media News Humanities History Law Digital Libraries OSTI Archive Scientific Journals arXiv Code repositories Data.gov PubMed Agency Archives Open Science Foundation Model Training Tuned and Adapted Downstream Models Co-Design AI: Open science foundation model(s)
  • 48.
    • General purposescientific LLM: Broadly trained, on general corpora; scientific papers and texts; structured science data • Explore pathways towards a “Scientific Assistant” • Built with international partners • Multilingual: English,日本語, French, German, Spanish, Italian, … • Multimodal: Images, tables, equations, proofs, time-series, graphs, fields, sequences, … Trillion Parameter Consortium A founding member of: AuroraGPT: A foundation model for open science
  • 49.
    Accelerating discovery usingAI, HPC, and robotics Extraction, integration and reasoning with knowledge at scale Tools help identify new questions based on needs and gaps in knowledge Machine representation of knowledge leads to new hypotheses and questions Generative models automatically propose new hypotheses that expand the discovery space Robotic labs automate experimentation and bridge digital models and physical testing Accelerated Scientific Method https://doi.org/10.1038/s41524-022-00765-z Pattern and anomaly detection integrated with simulation and experiment to extract insights
  • 50.
    AuroraGPT dFMs UX-LLM HybridAI models (Community of Experts information flows)
  • 51.
    Query PubMed forChatGPT feedstock Accelerated discovery processes For example: A peptide expert (Prototyped with PubMed and ChatGPT) Retrieve abstracts A from PubMed that reference specified peptide Use ChatGPT to build hypotheses by using retrieval-augmented generation: e.g.: “Given A, on which organism is {peptide} acting?” Arvind Ramanathan, Priyanka Setty, et al. We want a model with deep expertise regarding peptides and related topics We want to be able to make millions of such requests
  • 52.
    PMC Agent BC-BRC Agent Generateadditional experiments? ? Set of peptides as input Query PubMed for ChatGPT feedstock Align proteins, predict structure, rank results Evaluate structures and filter results UniProt Agent Peptide agent may be used with other agents to identify antimicrobial peptides Agents run on HPC/AI resources Self-driving lab performs experiments Candidates for experimental evaluation
  • 53.
    AARL-P Rapid Prototyping Lab Bldg240 AARL-C Polybot CNM User Facility Building 440 AARL-X APS Sector 8-ID AARL-A Airfree Building 200 AARL-B Biology BSL-2 Bldg 350 Argonne Autonomous Research Laboratories (AARL)
  • 54.
    Accelerating discovery usingAI, HPC, and robotics Extraction, integration and reasoning with knowledge at scale Tools help identify new questions based on needs and gaps in knowledge Machine representation of knowledge leads to new hypotheses and questions Generative models automatically propose new hypotheses that expand the discovery space Robotic labs automate experimentation and bridge digital models and physical testing Accelerated Scientific Method https://doi.org/10.1038/s41524-022-00765-z Pattern and anomaly detection integrated with simulation and experiment to extract insights • Access & integrate data, computing, instruments, and services • Anywhere, any time; securely, reliably, rapidly, scalably