Successfully reported this slideshow.
Your SlideShare is downloading. ×

Using Globus to Streamline Research at Scale

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 28 Ad

Using Globus to Streamline Research at Scale

Download to read offline

We provide an overview of the various Globus capabilities that can be used to automate data flows, with particular emphasis on managing data from instruments such as next generation sequencers and cryo electron microscopes. This session introduces the Globus command line interface (CLI) for integrating Globus tasks into scripts, and the Globus Flows service for more robust automation (including workflows that require a human in the loop).

Presented at a workshop at KU Leuven on July 8, 2022.

We provide an overview of the various Globus capabilities that can be used to automate data flows, with particular emphasis on managing data from instruments such as next generation sequencers and cryo electron microscopes. This session introduces the Globus command line interface (CLI) for integrating Globus tasks into scripts, and the Globus Flows service for more robust automation (including workflows that require a human in the loop).

Presented at a workshop at KU Leuven on July 8, 2022.

Advertisement
Advertisement

More Related Content

Similar to Using Globus to Streamline Research at Scale (20)

More from Globus (20)

Advertisement

Recently uploaded (20)

Using Globus to Streamline Research at Scale

  1. 1. Using Globus to Streamline Research at Scale Vas Vasiliadis vas@uchicago.edu 8 July 2022
  2. 2. Globus Automation Capabilities Timer Service Scheduled and recurring transfers (a.k.a. Globus cron) Command Line Interface Ad hoc scripting and integration Globus Flows service Comprehensive task (data and compute) orchestration with human in the loop interactions
  3. 3. Three perspectives • Researcher: ease of use, scalability • Administrator: visibility, access control • Builder
  4. 4. Globus Timer Service
  5. 5. Use case: Data replication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 5 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  6. 6. The Globus Timer service • Scheduled/recurring file transfers • Supports all Globus transfer and sync options • Service accessible via web app and CLI • Example: NIH – hpc.nih.gov/storage/globus_cron.html 6
  7. 7. Timer options in the Globus web app
  8. 8. Using the Globus Timer service 8 $ globus–timer session {login, logout, whoami} $ globus–timer job transfer --name example–job --label "Timer Transfer Job" --interval 28800 --start '2020–01–01T12:34:56' --source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec --dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec --item ~/file1.txt ~/new_file1.txt false --item ~/file2.txt ~/new_file2.txt false
  9. 9. Scheduled transfers using Globus timers 9
  10. 10. Globus Command Line Interface (CLI)
  11. 11. Globus Command Line Interface Open source, uses the Python SDK
  12. 12. UUIDs everywhere • UUIDs for endpoint, task, user identity, groups… • Use search/list options • get-identities for identity username to UUID $ globus endpoint search 'Tutorial Endpoint 1' $ globus task list $ globus get-identities vas@globusid.org bfc122a3-af43-43e1-8a41-d36f28a2bc0a
  13. 13. Step 1: Transfer files $ export src=<source_collection_UUID> $ export dst=<destination_collection_UUID> $ globus transfer --recursive $src:/~/carousel $dst:/globusworkshop $ globus task show <transfer_task_UUID>
  14. 14. Step 2: Set permissions • Set and manage permissions on guest collection • Requires access manager role $ export share=<guest_collection_UUID> $ globus endpoint permission create --permissions r -- identity demodoc@globusid.org $share:/globusworkshop/ $ globus endpoint permission list $share $ globus endpoint permission delete $share <perm_UUID>
  15. 15. Parsing CLI output • Default output is text; for JSON output use --format json $ globus endpoint search --filter-scope my-endpoints $ globus endpoint search --filter-scope my-endpoints -- format json • Extract specific attributes using --jmespath <expression> $ globus endpoint search --filter-scope my-endpoints -- jmespath 'DATA[].[id, display_name]'
  16. 16. Using the CLI for automation requires… • Evergreen auth à native app grant w/refresh tokens • Guest collections • Delegated permissions management
  17. 17. Globus Auth: Native apps • Client that cannot keep a secret: CLI, mobile, Jupyter notebooks, … • Register with Globus Auth à special callback URL • Native App grant is variation on the Authorization Code grant 20
  18. 18. Native App/Refresh Tokens Sample Code github.com/globus/native-app-examples • ./example_copy_paste.py – User copies and pastes code to the app • ./example_copy_paste_refresh_token.py – Stores refresh token locally, uses it to get new access tokens • See README for installation 21
  19. 19. Automation services ecosystem GET /provider_url/ POST /provider_url/run GET /provider_url/action_id/status GET /provider_url/action_id/cancel GET /provider_url/action_id/status Create Action Providers Define and deploy flows { “StartAt”: ”ToProject”, ”States” : { ”ToProject” : { … }, ”SetPermission” : { …}, “ProcessData” : { … } … }} Run flows
  20. 20. Globus Flows Service
  21. 21. Managed automation of tasks • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  22. 22. Automation with Globus Flows • Built on AWS Step Functions – Simple JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  23. 23. 26 Run flows: Guided input Label Notify user Timeout Dynamic forms generated from input schema
  24. 24. 27 Managing runs at scale
  25. 25. Globus-provided flows 28
  26. 26. Running a Globus Flow 29
  27. 27. Developing Globus Flows jupyter.demo.globus.org 30
  28. 28. Extending the ecosystem: Action providers 31 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided

×