Successfully reported this slideshow.
Your SlideShare is downloading. ×

Simple Data Automation with Globus (GlobusWorld Tour West)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 28 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Simple Data Automation with Globus (GlobusWorld Tour West) (20)

Advertisement
Advertisement

Recently uploaded (20)

Simple Data Automation with Globus (GlobusWorld Tour West)

  1. 1. Data Automation Programming Teaching Globus Some New Tricks Greg Nawrocki greg@globus.org September 15, 2021
  2. 2. 2 Custom portals? Science Gateways? Unique workflows? Our Command Line Interface, Timer Service, open REST APIs and Python SDK empower you to create an integrated ecosystem of research data services and applications.
  3. 3. PaaS Security Challenges – Globus Auth • How to provide: – Login to apps o Web apps (Jupyter Notebook, Portals), Mobile, Desktop, Command line – Protect all REST API communications o App à Globus service (Jupyter Notebook, Portals) o App à non-Globus service (Portals) o Service à service (Portals) • While: – Not introducing even more identities o Providing a platform to consolidate those identities – Providing least privileges security model (consents) – Being agnostic to programming language and framework – Being web friendly – Making it easy for users and developers 3
  4. 4. Securing Apps with Globus Auth • Native App (with refresh tokens – extend expiration) – Authentication as user identity – Authentication URL / come back with a auth code – exchanged for tokens – Clients can’t keep a secret - tokens in plain text – Jupyter Notebook examples / Timer Service • Auth Code Grant – Authentication as user identity – Browser redirect to Globus Auth, auth code returned (no manual copy) – Tokens stored securely – Jupyter hub secured with Globus Auth • Confidential Client: – Authentication as application – ClientID and Secret stored securely – Custom apps 4
  5. 5. Globus Command Line Interface Open source, uses the Python SDK Because of this correspondence the CLI is an excellent tool for getting the gist of how he SDK functions. Great in shell scripts.
  6. 6. Globus CLI • Easy install and get updates – https://docs.globus.org/cli/ – https://docs.globus.org/cli/examples/ – https://github.com/globus/globus-cli • All interactions with transfer and auth at the identity level – Command “globus login” gets access tokens and refresh tokens o Stores the token locally (~/.globus.cfg ) o Tokens for Globus Auth and Transfer services – Command “globus logout” deletes those – Command “globus whoami” reveals logged in identity
  7. 7. CLI Basics – “globus” is the executable $ globus endpoint search 'Globus Tutorial' $ globus task list $ globus get-identities greg@globus.org --verbose • Getting help / list of commands – globus list-commands – globus --help • UUIDs for endpoint, task, user identity, groups… • Can query to discover the UUIDs – Use search / list / get options
  8. 8. The Globus CLI – Simple tasks $ globus ls ddb59af0-6d04-11e5-ba46-22000b92c6ec $ globus ls ddb59aef-6d04-11e5-ba46-22000b92c6ec:/share/godata/ $globus transfer ddb59aef-6d04-11e5-ba46-22000b92c6ec:/share/godata/file3.txt ddb59af0-6d04-11e5-ba46-22000b92c6ec:/~/file3.txt • List endpoint contents • Single file transfer
  9. 9. The Globus CLI – Simple tasks $globus transfer --recursive ddb59aef-6d04-11e5-ba46-22000b92c6ec:/share/godata/ ddb59af0-6d04-11e5-ba46-22000b92c6ec:/~/ • Recursive transfer $ globus delete ddb59af0-6d04-11e5-ba46-22000b92c6ec:/~/file3.txt • Delete
  10. 10. Batch Transfers • Transfer tasks have one source/destination, but can have any number of files • Provide input source-dest pairs via local file • File may have embedded comments $ globus transfer ddb59aef-6d04-11e5-ba46-22000b92c6ec:/share/godata/ ddb59af0-6d04-11e5-ba46-22000b92c6ec:/~/ --batch --label 'CLI Batch' < files.txt
  11. 11. Parsing CLI output $ globus endpoint search --filter-scope my-endpoints $ globus endpoint search --filter-scope my-endpoints --format json $ globus endpoint search --filter-scope my-endpoints --jmespath 'DATA[].[id, display_name]' • Default output is text; for JSON output use --format json • Extract specific attributes using --jmespath <expression>
  12. 12. Managing notifications • Turn off emails sent for tasks • Useful when an application manages tasks for a user • Disable notifications with the --notify option --notify off (all notifications) --notify succeeded|failed|inactive (select notifications)
  13. 13. Other CLI Commands • globus endpoint permission – Mange access control rules – CLI based portal • globus endpoint role – Manage endpoint roles – Delegate roles to other identities • globus task – show – cancel
  14. 14. Automation with the CLI • Interactions are as user: both for data access and to Globus services – Globus login to get tokens • Collection access – Mapped Collections o Use the –skip-activation-check to submit the task even if endpoint is not activated at submit time – Guest Collections o Guest Collection / Shared Endpoints auto-activate o Use Guest Collections whenever possible • Reference – Basic Data Automation with the Globus Command Line Interface (CLI) o https://www.youtube.com/watch?v=qIQTC6YOvrE
  15. 15. The Globus Timer Service • For scheduling recurring Globus transfers using Globus Automate – Backups – Synchronizations • Doc: https://pypi.org/project/globus-timer-cli/ • Service with a CLI interface – Simple installation (pip install) – Authentication as user identity o Browser redirect to Globus Auth – copy back auth code – native app o Authentication information is thereafter cached so the authentication process is only needed on the first use of the CLI 15
  16. 16. Using the Globus Timer Service • globus-timer session {login, logout, whoami} 16 globus-timer job transfer --name example-job --label "Timer Transfer Job" --interval 28800 --start '2020-01-01T12:34:56’ --source-endpoint ddb59aef-6d04-11e5-ba46-22000b92c6ec --dest-endpoint ddb59af0-6d04-11e5-ba46-22000b92c6ec --item ~/file1.txt ~/new_file1.txt false --item ~/file2.txt ~/new_file2.txt false
  17. 17. Using the Globus Timer Service • --items-file transfer_items.csv • Other options – just like in the webApp --sync-level (how timer behaves if files exist) --verify-checksum --encrypt-data --preserve-timestamp --stop-after-runs --stop-after-date • globus-timer job transfer --help 17
  18. 18. Monitoring and Deleting Jobs • globus-timer job list • globus-timer job status <job_id> [--verbose] • globus-timer job delete <job_id> 18
  19. 19. Data centric applications leveraging Globus 19
  20. 20. Globus Transfer API • Globus Web App consumes public Transfer API • Resource named by URL (standard REST approach) – Query params allow refinement (e.g., subset of fields) • Globus APIs use JSON for documents and resource representations • Requests authorized via OAuth2 access token – Authorization: Bearer asdflkqhafsdafeawk docs.globus.org/api/transfer 20
  21. 21. Globus Python SDK • Python client library for the Globus Auth and Transfer REST APIs • TransferClient class handles connection management, security, framing, marshaling – Largely direct mapping to REST API – One method for each API resource and HTTP verb • Nice high level wrapper to the API – manages low level API housekeeping tasks https://globus-sdk-python.readthedocs.io/en/stable/ globus.github.io/globus-sdk-python 21
  22. 22. Endpoint Activation • Activating endpoint means binding a credential to an endpoint for login • Mapped Collections require login via web app • Auto-activate – Globus Connect Personal and Guest Collections use Globus-provided credential – Must auto-activate before any API calls to endpoints 23
  23. 23. Synchronous Tasks • Endpoint search (with scopes) • List directory contents (ls) • Make directory (mkdir) • Rename • Note: – Path encoding & UTF gotchas – Don’t forget to auto-activate first 24
  24. 24. Asynchronous Tasks • Transfer – Sync level option • Delete • Get submission_id, followed by submit – Once and only once submission • Use task id to “follow up” 25
  25. 25. The Globus API / SDK with a Jupyter Notebook in a Jupyter Hub login REST APIs { “tokens”:… {“tokens”:… REST APIs REST APIs Bearer a45cd…
  26. 26. Walkthrough API with our Jupyter Hub • https://jupyter.demo.globus.org – Sign in with Globus – Verify the consents – Start My Server (this will take about a minute) – Open folder: globus-jupyter-notebooks – Run Platform_Introduction_JupyterHub_Auth.ipynb • If you mess it up and want to “go back to the beginning” – Just stop and restart the server • If you want to use the notebook outside of our hub – https://github.com/globus/globus-jupyter-notebooks – Authentication is a manual cut and paste of exchanging the authorization code for an access token – Native App 27
  27. 27. Automation Examples • Simple code examples for various use cases using Globus – https://github.com/globus/automation-examples – Syncing a directory o Bash script that calls the Globus CLI and a Python module that can be run as a script or imported as a module. – Staging data in a shared directory o Bash / Python – Removing directories after files are transferred o Python script 28
  28. 28. Support resources • Globus documentation: docs.globus.org • GitHub: https://github.com/globus • YouTube channel: youtube.com/user/GlobusOnline

×