Using Globus to Streamline
Research at Scale
Vas Vasiliadis
vas@uchicago.edu
8 July 2022
Globus Automation Capabilities
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
Three perspectives
• Researcher: ease of use, scalability
• Administrator: visibility, access control
• Builder
Globus Timer Service
Use case: Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
5
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
The Globus Timer service
• Scheduled/recurring file transfers
• Supports all Globus transfer and sync options
• Service accessible via web app and CLI
• Example: NIH – hpc.nih.gov/storage/globus_cron.html
6
Timer options
in the Globus
web app
Using the Globus Timer service
8
$ globus–timer session {login, logout, whoami}
$ globus–timer job transfer 
--name example–job 
--label "Timer Transfer Job" 
--interval 28800 
--start '2020–01–01T12:34:56' 
--source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec 
--dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec 
--item ~/file1.txt ~/new_file1.txt false 
--item ~/file2.txt ~/new_file2.txt false
Scheduled transfers
using Globus
timers
9
Globus Command Line
Interface (CLI)
Globus Command Line Interface
Open source, uses
the Python SDK
UUIDs everywhere
• UUIDs for endpoint, task, user identity, groups…
• Use search/list options
• get-identities for identity username to UUID
$ globus endpoint search 'Tutorial Endpoint 1'
$ globus task list
$ globus get-identities vas@globusid.org
bfc122a3-af43-43e1-8a41-d36f28a2bc0a
Step 1: Transfer files
$ export src=<source_collection_UUID>
$ export dst=<destination_collection_UUID>
$ globus transfer --recursive $src:/~/carousel
$dst:/globusworkshop
$ globus task show <transfer_task_UUID>
Step 2: Set permissions
• Set and manage permissions on guest collection
• Requires access manager role
$ export share=<guest_collection_UUID>
$ globus endpoint permission create --permissions r --
identity demodoc@globusid.org $share:/globusworkshop/
$ globus endpoint permission list $share
$ globus endpoint permission delete $share <perm_UUID>
Parsing CLI output
• Default output is text; for JSON output use --format json
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --
format json
• Extract specific attributes using --jmespath <expression>
$ globus endpoint search --filter-scope my-endpoints --
jmespath 'DATA[].[id, display_name]'
Using the CLI for automation requires…
• Evergreen auth à native app grant w/refresh tokens
• Guest collections
• Delegated permissions management
Globus Auth: Native apps
• Client that cannot keep a secret: CLI, mobile, Jupyter
notebooks, …
• Register with Globus Auth à special callback URL
• Native App grant is variation on the Authorization
Code grant
20
Native App/Refresh Tokens Sample Code
github.com/globus/native-app-examples
• ./example_copy_paste.py
– User copies and pastes code to the app
• ./example_copy_paste_refresh_token.py
– Stores refresh token locally, uses it to get new access tokens
• See README for installation
21
Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
Globus Flows Service
Managed automation of tasks
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
26
Run flows: Guided input
Label
Notify user
Timeout
Dynamic forms generated
from input schema
27
Managing runs at scale
Globus-provided flows
28
Running a
Globus Flow
29
Developing
Globus Flows
jupyter.demo.globus.org
30
Extending the ecosystem: Action providers
31
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided

Using Globus to Streamline Research at Scale

  • 1.
    Using Globus toStreamline Research at Scale Vas Vasiliadis vas@uchicago.edu 8 July 2022
  • 2.
    Globus Automation Capabilities TimerService Scheduled and recurring transfers (a.k.a. Globus cron) Command Line Interface Ad hoc scripting and integration Globus Flows service Comprehensive task (data and compute) orchestration with human in the loop interactions
  • 3.
    Three perspectives • Researcher:ease of use, scalability • Administrator: visibility, access control • Builder
  • 4.
  • 5.
    Use case: Datareplication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 5 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  • 6.
    The Globus Timerservice • Scheduled/recurring file transfers • Supports all Globus transfer and sync options • Service accessible via web app and CLI • Example: NIH – hpc.nih.gov/storage/globus_cron.html 6
  • 7.
    Timer options in theGlobus web app
  • 8.
    Using the GlobusTimer service 8 $ globus–timer session {login, logout, whoami} $ globus–timer job transfer --name example–job --label "Timer Transfer Job" --interval 28800 --start '2020–01–01T12:34:56' --source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec --dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec --item ~/file1.txt ~/new_file1.txt false --item ~/file2.txt ~/new_file2.txt false
  • 9.
  • 10.
  • 11.
    Globus Command LineInterface Open source, uses the Python SDK
  • 12.
    UUIDs everywhere • UUIDsfor endpoint, task, user identity, groups… • Use search/list options • get-identities for identity username to UUID $ globus endpoint search 'Tutorial Endpoint 1' $ globus task list $ globus get-identities vas@globusid.org bfc122a3-af43-43e1-8a41-d36f28a2bc0a
  • 13.
    Step 1: Transferfiles $ export src=<source_collection_UUID> $ export dst=<destination_collection_UUID> $ globus transfer --recursive $src:/~/carousel $dst:/globusworkshop $ globus task show <transfer_task_UUID>
  • 14.
    Step 2: Setpermissions • Set and manage permissions on guest collection • Requires access manager role $ export share=<guest_collection_UUID> $ globus endpoint permission create --permissions r -- identity demodoc@globusid.org $share:/globusworkshop/ $ globus endpoint permission list $share $ globus endpoint permission delete $share <perm_UUID>
  • 15.
    Parsing CLI output •Default output is text; for JSON output use --format json $ globus endpoint search --filter-scope my-endpoints $ globus endpoint search --filter-scope my-endpoints -- format json • Extract specific attributes using --jmespath <expression> $ globus endpoint search --filter-scope my-endpoints -- jmespath 'DATA[].[id, display_name]'
  • 16.
    Using the CLIfor automation requires… • Evergreen auth à native app grant w/refresh tokens • Guest collections • Delegated permissions management
  • 17.
    Globus Auth: Nativeapps • Client that cannot keep a secret: CLI, mobile, Jupyter notebooks, … • Register with Globus Auth à special callback URL • Native App grant is variation on the Authorization Code grant 20
  • 18.
    Native App/Refresh TokensSample Code github.com/globus/native-app-examples • ./example_copy_paste.py – User copies and pastes code to the app • ./example_copy_paste_refresh_token.py – Stores refresh token locally, uses it to get new access tokens • See README for installation 21
  • 19.
    Automation services ecosystem GET/provider_url/ POST /provider_url/run GET /provider_url/action_id/status GET /provider_url/action_id/cancel GET /provider_url/action_id/status Create Action Providers Define and deploy flows { “StartAt”: ”ToProject”, ”States” : { ”ToProject” : { … }, ”SetPermission” : { …}, “ProcessData” : { … } … }} Run flows
  • 20.
  • 21.
    Managed automation oftasks • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  • 22.
    Automation with GlobusFlows • Built on AWS Step Functions – Simple JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  • 23.
    26 Run flows: Guidedinput Label Notify user Timeout Dynamic forms generated from input schema
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Extending the ecosystem:Action providers 31 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided