We are on the verge of a global communications revolution based on ubiquitous high-speed 5G, 6G, and free-space optics technologies. The resulting global communications fabric can enable new ultra-collaborative research modalities that pool sensors, data, and computation with unprecedented flexibility and focus. But realizing these modalities requires new services to overcome the tremendous friction currently associated with any actions that traverse institutional boundaries. The solution, I argue, is new global science services to mediate between user intent and infrastructure realities. I describe our experiences building and operating such services and the principles that we have identified as needed for successful deployment and operations.
2. Compare climate models to
understand the earth system
>100 models, >20 countries
Coupled Model Intercomparison Project
(CMIP): Standard protocol for studying
general circulation model output
Earth System Grid Federation
3. Foundation model(s)
trained on 110 million
viral sequences
Finetune model
on SARS-CoV-2
ORFs
TRAINING Periodically
retrain on
new variants
Trained
foundation
model(s)
PREDICTION WORKFLOW
DETECTION
WORKFLOW
Semantic
similarity score
(embeddings)
Sequence log
likelihood
score
Immune
Escape
Fitness
Evaluation
OPENFOLD
PPI
interaction
(MD
simulations)
Epitope
alteration
Variant of
Concern score
Diffusion model to
derive hierarchy of
gene organization
Generated SARS-
CoV-2 genomes
Link field data on pathogen variants
with AI for pandemic surveillance
Detection
targets
New viral
genomes
M. Zvyagin et al., https://www.biorxiv.org/content/10.1101/2022.10.10.511571v1
Field
Labs
1.5 x 1021
flops
4. ources
Cerebras CS-2
1102 1102
1102 1102
Model (re)training on local GPU:
1102 seconds
@ LCLS
Connect scientific instruments to remote computers
to create smart instruments
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
5. ources
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
7 seconds
7 + 19 + 5 = 31 s
35x faster
5 seconds
19 seconds
Cerebras CS-2
@ LCLS
Connect scientific instruments to remote computers
to create smart instruments
7. http://dx.doi.org/10.1016/j.wear.2016.04.019
Friction first characterized by Leonardo di Vinci in 1493
Integration of distributed resources, a sine qua non,
is impeded by numerous sources of friction
“Whereas computational friction expresses the struggle involved in transforming
data information and knowledge … data friction expresses a more primitive
form of resistance -- the costs in time, energy, and attention required simply to
collect, check, store, move, receive, and access data. Whenever data travel …
data friction impedes their movement” (Edwards, 2010, p. 84).
10. Free-space optics
”Henceforth, space for itself, and time for itself, are doomed to fade
away into mere shadows, and only a kind of union of the two will
preserve an independent reality.” – Hermann Minkowski
A computing continuum
Aalyria’s “Spacetime” [originally “Minkowski”] platform
Increasingly, telecommunications is not the problem
11. 1) Act on resources regardless of location and interface
Widely deployed local agents
provide a global footprint for actions
2) Execute remote actions reliably
Cloud-hosted managed research acceleration
services buffer against inevitable failures
3) Manage who is trusted to perform what actions, where and when
Distributed authentication with delegation
enables secure management of privileges
But it remains unduly difficult to:
Friction: Varying interfaces, behaviors; reliability; security
12. 1) Act on resources regardless of location and interface
Widely deployed local agents
provide a global footprint for actions
2) Execute remote actions reliably
Cloud-hosted managed research acceleration
services buffer against inevitable failures
3) Manage who is trusted to perform what actions, where and when
Distributed authentication with delegation
enables secure management of privileges
But it remains unduly difficult to:
Friction: Great variety of failures
Friction: Varying interfaces, behaviors; reliability; security
13. 1) Act on resources regardless of location and interface
Widely deployed local agents
provide a global footprint for actions
2) Execute remote actions reliably
Cloud-hosted managed research acceleration
services buffer against inevitable failures
3) Manage who is trusted to perform what actions, where and when
Distributed authentication with delegation
enables secure management of privileges
But it remains unduly difficult to:
Friction: Varying credentials, authentication protocols, authorization policies;
need to act on behalf of others
Friction: Varying interfaces, behaviors; reliability; security
Friction: Great variety of failures, scalability, usability
15. Need: 1) Act anywhere
Adapter interface
Plug-in
Plug-in
Adapter interface
Plug-in
Adapter interface
Plug-in
Plug-in
Globus
Connect
Globus
Compute
Lightweight
agents abstract
local details
Globus
Connect
Globus
Compute
Our approach:
• HTTPS, GridFTP for universal, fast access
• Portable agents for broad deployment
• Modularity to target many systems
• Integration with secure delegation
• Integration with hosted supervision
16. Need: 2) Reliable execution of (sets of) actions
Past approaches:
• Workflow systems
• Distributed file systems
• Eventing, consistency protocols
• Reliable RPC, replication
• Cloud
Challenges: Complexity, fragility, scalability, reach
17. Need: 2) Reliable execution of (sets of) actions
Hosted research supervision services
Cloud-hosted, persistent, scalable, resilient, secure
Other
services
Transfer
Copy data
AB
Compute
Run f(x)@C
actions
Search
Read & write
catalogs
Other
Globus
services
Flows
Sequences
of actions
Non-Globus
services
Hosted
research
supervision
services
Our approach:
• Cloud-hosted, replicated supervision
• Simple retry-based protocols
• Reduce endpoint complexity
• High assurance for sensitive data
• Integration with secure delegation
18. Need: 3) Control who can perform what actions, when & where
Hosted research supervision services
Cloud-hosted, persistent, scalable, resilient, secure
Other
services
Transfer
Copy data
AB
Compute
Run f(x)@C
actions
Search
Read & write
catalogs
Other
Globus
services
Flows
Sequences
of actions
Non-Globus
services
Past approaches:
• Passwords, PKI, Kerberos
• Grid Security Infrastructure
• OAuth,
• Specialized delegation protocols
Challenges: Multi-site, dynamic computing; complexity, usability
19. Globus Auth
trust fabric
OAuth + delegation
Hosted research supervision services
Cloud-hosted, persistent, scalable, resilient, secure
Other
services
Transfer
Copy data
AB
Compute
Run f(x)@C
actions
Search
Read & write
catalogs
Other
Globus
services
Flows
Sequences
of actions
Non-Globus
services
Federated auth &
secure managed
delegation
Need: 3) Control who can perform what actions, when & where
Our approach:
• Secure delegation
• Scoped credentials
• Leverage OAuth2
• Brower compatibility
• IdP federation
20. (5) “Run F()”
F()
@A
node A
F()
Auth
(3) “Run F()
on node A”
Flows (2) “Provide access
token for funcX”
(4) “Provide
dependent token
for F() on node A”
(6) F() runs
Consent
Dependent
token
Token hierarchy
(1) “Run flow”
with consent
Access
token
Cloud-hosted
services
Globus Auth: A managed research acceleration service
providing distributed authorization with delegation
Not shown:
Token refresh
Who do I trust to act on
my behalf, when, and for
what purpose?
Leverage OAuth for:
• Security via scoped
credentials
• Usability via browser
compatibility
S. Tuecke et al., https://doi.org/10.1109/eScience.2016.7870901, R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
1700 identity providers
1.3 B access tokens
2.7 M consents
Globus
Compute
21. 1) Act on resources regardless of location and interface
Widely deployed local agents
provide a global footprint for actions
2) Execute remote actions reliably
Cloud-hosted managed research acceleration
services buffer against inevitable failures
3) Manage who is trusted to perform what actions, where and when
Distributed authentication with delegation
enables secure management of privileges
In total: Global services enable low-friction global science
22. Globus platform
Operated by UChicago for researchers worldwide
Made possible by the support of 200+ subscribers
25. Collins Australian Clear
School Atlas, 1964
Wellington, New Zealand
to Madrid, Spain:
19,855 km
Circumference of the earth:
40,075 km
Semi-circumference:
20,037 km
26. https://dashboard.globus.org/esgf As of May 4, 2022
1.5 GB/s
4 to 6 GB/s
7.5 PB transferred between mid-Feb and May 4, ‘22
17,347,671 directories and 28,907,532 files
27. Cumulative data transferred over time
GPFS Errors &
Unreadable Files
7.5 PB transferred between mid-Feb and May 4, ‘22
17,347,671 directories and 28,907,532 files
28. def F(in_args):
# do something
return results
gcc.register_function(F)
$ pip install globus-compute-endpoint
$ globus-compute-endpoint configure myep
$ globus-compute-endpoint start myep
Globus Compute: A hosted research supervision service
that implements a universal computing fabric
https://funcx.org
Z. Li et al., https://doi.org/10.48550/arXiv.2209.11631
F(ep, arg)
F(ep, arg)
F(ep, “A”)
f = gcc.run(“A”,
endpoint_id=ep,
function_id=F)
R = gcc.get_result(f)
Deploy Globus Compute agents
Register functions
Run functions
Globus
Compute
31. “Globus Flows”: Hosted research supervision services for flow
specification, execution, and management
R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
GLOBUS
COMPUTE
AGENT
32. ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
7 seconds
7 + 19 + 5 = 31 s
35x faster
5 seconds
19 seconds
Cerebras CS-2
@ LCLS
Flows allow us to create smart instruments
33. ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
7 seconds
7 + 19 + 5 = 31 s
35x faster
5 seconds
19 seconds
Cerebras CS-2
@ LCLS
Flows allow us to create smart instruments
ModelTrain
34. New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via
Globus Compute
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
34
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Seven flows in use at the
Advanced Photon Source
35. New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via
Globus Compute
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
35
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Execution times at the
Argonne Leadership
Computing Facility
37. Continuum-aware programming
Compute fabric
Trust fabric
Access data
anywhere
Compute
anywhere
Ensure only authorized operations
occur in trusted locations
Develop efficient, reliable,
reusable applications
New methods and tools to program the continuum
Data fabric
38. Continuum-aware programming
Compute fabric
Trust fabric
Globus
Compute
Globus
Transfer
Globus
Automation
Services
Globus
Auth
https://braid-project.org
https://globus.org
Data fabric
New methods and tools to program the continuum
39. Foucault’s pendulum, Paris Panthéon
Problems we have solved, in part
• Authorization (“enterprise grade
security, consumer-grade usability”)
• Universal connectivity
• Location-independent computing
• Programming
New opportunities and challenges
• New devices and applications
• Massive performance and scale
• Energy as a ubiquitous constraint
• Global data space architecture
• Programming
Utility (‘60s) … xSP (’80s) … Grid (‘90s) … Cloud (2010) … Edge … Continuum …
The pendulum swings between centralization and decentralization
40. There remain important questions to answer
• What will continuum networks look like in practice?
• Energy vs. accuracy vs. cost vs. … tradeoffs
• Where will we place computing and storage?
• What new instruments will be created?
• What new applications and new science will be enabled?
• What new abstractions, services, and tools do we need?
• Do we need new continuum-aware algorithm design methods?
• What will be economic foundation of this new computing fabric?
• (How) Will we integrate quantum sensors, networks, computers?
41. Thank you for your attention!
Argonne National Laboratory and the University of Chicago – and students and staff
Federal agencies for continued support: DOE, NSF, NIH, NIST
Wonderful colleagues: Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Ryan Chard,
Carl Kesselman, Arvind Ramanathan, Rick Stevens, Vas Vasiliadis, Logan Ward, & many more
To learn more about our work: https://labs.globus.org https://globus.org
Ask questions or share your thoughts: foster@anl.gov
Experiment with tools:
https://braid-project.org
Thanks to:
https://doi.org/10.1016/j.patter.2022.100606
Editor's Notes
We are on the verge of a global communications revolution based on ubiquitous high-speed 5G, 6G, and free-space optics technologies. The resulting global communications fabric can enable new ultra-collaborative research modalities that pool sensors, data, and computation with unprecedented flexibility and focus. But realizing these modalities requires new services to overcome the tremendous friction currently associated with any actions that traverse institutional boundaries. The solution, I argue, is new global science services to mediate between user intent and infrastructure realities. I describe our experiences building and operating such services and the principles that we have identified as needed for successful deployment and operations.
https://clivebest.com/blog/?p=8788
Or, a quite different example, but just as important, linking field data on SARS-Cov-2 genome data to large-scale AI models.
Another example, involving work by Zhengchun Liu and others at Argonne and SLAC.
High energy diffraction microscopy experiments at the Linac Coherent Light Source use a deep neural network, running on a GPU, to extract information from high-rate measurements of microstructure evolution in materials.
The GPU is not powerful enough, however, for the periodic retraining required to respond to structural deformation.
Dispatching requests to a Cerebras system at Argonne reduces this time to 31 seconds: 12 seconds to move data and the new model; 19 seconds for training.
So 35x faster
Another example: recently funded CROCUS urban integrated field laboratory
Combining multiple sensors plus simulations to generate both short-term flood warnings and long-term flood risk warnings
Chicago Tribune, September 25, 2022
One aspect of this new world is that the large white spaces in our maps, in which little or no communication is possible, will be filled in.
5G and 6G are one reason.
Another technology that may be even more transformative is coherent light free-space optics: that is, space lasers.
This image depicts such a device, on the right, and the “SpaceTime” platform that ah-Leer-eeh-ah, a Google spinout, is developing to provide global communication.
The original name for their platform was Minkowski, after Herman Minkowski
Hermann Minkowski, a colleague of Einstein’s, observed about the special theory of relativity that it defined a space-time continuum
Another essential building block, this one operational for several years, is the Globus Auth identity and access management platform service.
We need flows to perform actions on instruments, computers, repositories, and networks at remote locations.
To do this securely, we provide consent and tokens mechanisms for controlling what agents acting on a users behalf can do.
Supported by over 200 subscribers world wide.
As a second example of work intended to produce better information faster, I will describe the work of the Globus team on science services to accelerate complex tasks, for example, by making fast and reliable data transfer a one-click operation.
The Globus system, developed and operated by a wonderful team at the University of Chicago, uses software running on the Amazon cloud to manage data transfers, and more, with extreme reliability and high performance: what we may call a managed research acceleration service.
I created this animation of Globus transfers from late 2010 to today.
The x-axis is the great circle distance between source and destination, from 0 to 20,000 kilometers.
The y-axis the transfer size, from 100MB to several petabytes.
Each dot represents a transfer. I also track total exabytes and gigafiles.
I label a few source-destination pairs
You’ll see a large gap around 5000 km. It turns out that the sizes of the continents and oceans means that few communications occur over that distance.
You may also see a point to the far right at 19855 kilometers.
Can any two places be that far apart?
It turns out those dots represent transfers between Madrid and Wellington,
And those two cities are as about as far apart on the globe as you can get, as shown in this map from a childhood atlas explaining the concept of ”antipodes.”
Another managed research acceleration service, Globus Compute (formerly funcX), led by Kyle Chard, …
Diaspora
A wonderful team at the University of Chicago led by Rachana Ananthakrishnan has developed Globus Automation Services to manage the execution of such flows.
As a cloud-hosted service, they can manage flows that run for seconds or months.
Dispatching requests to a Cerebras system at Argonne reduces this time to 31 seconds: 12 seconds to move data and the new model; 19 seconds for training.
So 35x faster
Dispatching requests to a Cerebras system at Argonne reduces this time to 31 seconds: 12 seconds to move data and the new model; 19 seconds for training.
So 35x faster
I’ll mention that these new applications represent a new source of increasingly different workloads.
Here, for example, I show for thousands of instances of seven synchrotron light source flows the average duration of each action.
Durations range from seconds to hours
Globus services ensure reliable execution.
I’ll mention that these new applications represent a new source of increasingly different workloads.
Here, for example, I show for thousands of instances of seven synchrotron light source flows the average duration of each action.
Durations range from seconds to hours
Globus services ensure reliable execution.
We have identified the following key requirements. Surely not a complete list, but all essential.
Others include methods for managing flows, computing anywhere, accessing data anywhere, and delegating credentials.
As with Globus Transfer, we make extensive use of cloud-hosted research acceleration services,
Which I remind you combine cloud hosted management logic for high reliability with local agents for global footprint
We are exploring new programming abstractions and managed research acceleration services*
You might ask, how are the problems and solutions that I have presented here different from those deployed 10, 20, or 30 years ago?
Dan Reed has observed that computing differs from other sciences in that the questions often stay the same while the answers change.
Far more reliable and performant networks, and effective containerization, allow us to revisit old answers.
And new concerns, like energy, result in new questions.
Indeed, our early experiences programing the continuum suggest numerous other questions.