Successfully reported this slideshow.
Your SlideShare is downloading. ×

Automating Research Data Flows and an Introduction to the Globus Platform

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 42 Ad

Automating Research Data Flows and an Introduction to the Globus Platform

Download to read offline

We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service. We use a Jupyter notebook to demonstrate automation of file transfers and permissions management on shared datasets. We also provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on understanding the security model; and will demonstrate how to access Globus services via APIs for integration with custom research applications.

Presented at a workshop at Oak Ridge National Laboratory on June 23, 2022.

We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service. We use a Jupyter notebook to demonstrate automation of file transfers and permissions management on shared datasets. We also provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on understanding the security model; and will demonstrate how to access Globus services via APIs for integration with custom research applications.

Presented at a workshop at Oak Ridge National Laboratory on June 23, 2022.

Advertisement
Advertisement

More Related Content

Similar to Automating Research Data Flows and an Introduction to the Globus Platform (20)

More from Globus (20)

Advertisement

Recently uploaded (20)

Automating Research Data Flows and an Introduction to the Globus Platform

  1. 1. Automating Research Data Flows and an Introduction to the Globus Platform Greg Nawrocki greg@globus.org June 23, 2022
  2. 2. Globus Automation Capabilities Timer Service Scheduled and recurring transfers (a.k.a. Globus cron) Command Line Interface Ad hoc scripting and integration Globus Flows service Comprehensive task (data and compute) orchestration with human in the loop interactions
  3. 3. Globus Auth: Foundational IAM service • Brokers authentication and authorization among… – End-users – Identity providers: enterprise, external (federated identities) – Services: resource servers with REST APIs – Apps: web, mobile, desktop, command line clients – Services acting as clients to other services • Support high assurance service for use with protected data (e.g. HIPAA protected data) 5
  4. 4. Securing Apps with Globus Auth • Native App (with refresh tokens – extend expiration) – Authentication as user identity – Authentication URL / come back with a auth code – exchanged for tokens – Clients can’t keep a secret - tokens in plain text – Jupyter Notebook examples / Timer Service • Auth Code Grant – Templated App – Authentication as user identity – Browser redirect to Globus Auth, auth code returned (no manual copy) – Tokens stored securely – CLI / Jupyter Hub secured with Globus Auth • Confidential Client: – Authentication as application – ClientID and Secret stored securely – Custom apps – developers.globus.org - Client ID / Secret / Client Identity Username 6
  5. 5. Globus Timer Service
  6. 6. Use case: Data replication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 12 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  7. 7. The Globus Timer service • Scheduled/recurring file transfers • Supports all Globus transfer and sync options • Service accessible via web app and CLI • Example: NIH – hpc.nih.gov/storage/globus_cron.html 13
  8. 8. Using the Globus Timer service 14 $ globus–timer session {login, logout, whoami} $ globus–timer job transfer --name example–job --label "Timer Transfer Job" --interval 28800 --start '2020–01–01T12:34:56' --source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec --dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec --item ~/file1.txt ~/new_file1.txt false --item ~/file2.txt ~/new_file2.txt false
  9. 9. Timer options in the Globus web app
  10. 10. Globus Command Line Interface (CLI)
  11. 11. Globus Command Line Interface Open source, uses the Python SDK Because of this correspondence the CLI is an excellent tool for getting the gist of how he SDK functions. Very scriptable!
  12. 12. Globus CLI • It’s a stand alone application distributed by Globus – https://docs.globus.org/cli/ – https://github.com/globus/globus-cli • Easy install and updates • Command “globus login” gets access tokens • All interactions with the service use the tokens – The CLI is acting as you (your identity) • Command “globus logout” deletes those
  13. 13. CLI Basics – “globus” is the executable $ globus endpoint search 'Globus Tutorial' $ globus task list $ globus get-identities greg@globus.org --verbose • Getting help / list of commands – globus list-commands – globus –help – https://docs.globus.org/cli/examples/ • UUIDs for endpoint, task, user identity, groups… • Can query to discover the UUIDs – Use search / list / get options
  14. 14. Use case: Sharing out data Researcher initiates transfer request; or requested automatically by script, science gateway Instrument Globus controls access to shared files on existing storage; no need to move files to cloud storage! Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus Personal Computer Transfer Share Compute Facility Globus transfers files reliably, securely
  15. 15. Step 1: Transfer • Using a Batch Transfer – Transfer tasks have one source/destination, but can have any number of files – Provide input source-dest pairs via local file – File may have embedded comments $ export ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec $ export ep2=af7bda53-6d04-11e5-ba46-22000b92c6ec $ globus transfer $ep1:/share/godata/ $ep2:/~/automation-example- data-share/outbound/Vas --label "Files to share" --batch /Users/gregnawrocki/Greg/Globus/Demos/ornl20220623/files.txt # this is the contents of files.txt: # a list of source paths followed by destination paths file1.txt file1.txt # file2.txt file2.txt # inline-comments are also allowed file3.txt file3.txt
  16. 16. Step 2: Share - Set permissions • Set and manage permissions on guest collection • Requires access manager role $ globus endpoint search 'automation-example-data-share’ $ export share=bf4445c4-ecfd-11ec-aed7-6f7c2b57b05c $ globus endpoint permission create --permissions r --identity vas@uchicago.edu $share:/Vas/
  17. 17. Useful submission commands • Safe resubmissions – Applies to all tasks (transfer and delete) – Get a task UUID, use that in submission – $ globus task generate-submission-id – --submission-id option in transfer – Useful for lazy branching or when dealing with unreliable networks • Task wait – useful for scripting conditionals on transfer task status
  18. 18. Parsing CLI output $ globus endpoint search --filter-scope my-endpoints $ globus endpoint search --filter-scope my-endpoints --format json $ globus endpoint search --filter-scope my-endpoints --jmespath 'DATA[].[id, display_name]' • Default output is text; for JSON output use --format json • Extract specific attributes using --jmespath <expression>
  19. 19. Managing notifications • Turn off emails sent for tasks • Useful when an application manages tasks for a user • Disable notifications with the --notify option --notify off (all notifications) --notify succeeded|failed|inactive (select notifications)
  20. 20. Globus Flows Service
  21. 21. Managed automation of tasks • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  22. 22. Automation services ecosystem GET /provider_url/ POST /provider_url/run GET /provider_url/action_id/status GET /provider_url/action_id/cancel GET /provider_url/action_id/status Create Action Providers Define and deploy flows { “StartAt”: ”ToProject”, ”States” : { ”ToProject” : { … }, ”SetPermission” : { …}, “ProcessData” : { … } … }} Run flows
  23. 23. Automation with Globus Flows • Built on AWS Step Functions – Simple JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  24. 24. Globus-provided flows 31
  25. 25. 32 Run flows: Guided input Label Notify user Timeout Dynamic forms generated from input schema
  26. 26. 33 Managing runs at scale
  27. 27. Developing Globus Flows jupyter.demo.globus.org 34
  28. 28. Extending the ecosystem: Action providers 35 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided
  29. 29. The Globus Platform APIs and the SDK 36
  30. 30. 37 Custom portals? Science Gateways? Unique workflows? Our open REST APIs and Python SDK empower you to create an integrated ecosystem of research data services and applications.
  31. 31. Data centric applications leveraging Globus 38
  32. 32. App Access to Collections • Globus Transfer – Authentication with access tokens – Individual: Globus login to get tokens – Application: Apps are people too! o developers.globus.org - Client ID / Secret / Client Identity Username • Collection access – GCSv4 Mapped Collections – user consent o Activating endpoint means binding a credential to an endpoint for login – GCSv5 Mapped Collections (no user certificates, OAUTH tokens and consents) o https://docs.globus.org/globus-connect-server/v5.4/use-client-credentials/ o Request the data_access scope (per collection) to be able to access the collection. o The storage gateway must permit identities from the 'clients.auth.globus.org' identity domain o Identity Mapping Policy that maps the ‘UUID@clients.auth.globus.org' identity to a valid local user – Guest Collections o Guest Collections auto-activate - need to do this before API calls to endpoints o Use Guest Collections whenever possible – Remember to set your ACLs (WebApp) Automation
  33. 33. https://developers.globus.org
  34. 34. Globus APIs • Auth • Groups • Transfer • Search • Timer • Flows • GCS Manager • Globus Web App consumes public Transfer API • Resource named by URL (standard REST approach) • Globus APIs use JSON for documents docs.globus.org/api/transfer
  35. 35. Globus Python SDK • Python client library for the Globus REST APIs • Largely direct mapping to REST API • globus_sdk.TransferClient class handles connection management, security, framing, marshaling globus-sdk-python.readthedocs.io/en/stable/ globus.github.io/globus-sdk-python 42
  36. 36. TransferClient higher-level calls • One method for each API resource and HTTP verb • Largely direct mapping to REST API endpoint_search(filter_fulltext=None, filter_scope=None, num_results=25, **params) 44
  37. 37. Synchronous Tasks • Endpoint search (with scopes) • List directory contents (ls) • Make directory (mkdir) • Rename • Note: – Path encoding & UTF gotchas – Don’t forget to auto-activate first 46
  38. 38. Asynchronous Tasks • Transfer – Sync level option • Delete • Get submission_id, followed by submit – Once and only once submission • Use task id to “follow up” 47
  39. 39. The Globus API / SDK with a Jupyter Notebook in a Jupyter Hub – Auth Code Grant login REST APIs { “tokens”:… {“tokens”:… REST APIs REST APIs Bearer a45cd…
  40. 40. Walkthrough API with our Jupyter Hub • https://jupyter.demo.globus.org – Sign in with Globus – Verify the consents – Start My Server (this will take about a minute) – Open folder: globus-jupyter-notebooks – Run Platform_Introduction_JupyterHub_Auth.ipynb • If you mess it up and want to “go back to the beginning” – Just stop and restart the server • If you want to use the notebook outside of our hub – https://github.com/globus/globus-jupyter-notebooks – Authentication is a manual cut and paste of exchanging the authorization code for an access token – Native App 49
  41. 41. Support Resources • Globus Documentation: docs.globus.org • Globus Timer: globus.org/blog/scheduled-and- recurring-transfers-now-available-globus-web-app • Globus CLI: docs.globus.org/cli/ • Globus Automation Services and Flows: docs.globus.org/globus-automation-services/ • YouTube Channel: youtube.com/user/GlobusOnline
  42. 42. Developer References • Globus API / SDK Documentation – Transfer API : docs.globus.org/api/transfer/ – SDK: globus-sdk-python.readthedocs.io/en/stable/ • Globus GitHub: github.com/globus/ – Jupyter Notebooks o Stand alone notebooks and hub integrations that walk through much of the functionality of our SDK o https://github.com/globus/globus-jupyter-notebooks – Automation Examples o Shell scripted CLI and Python module examples of common research data management use cases o https://github.com/globus/automation-examples

×