Automating Research Data Workflows (GlobusWorld Tour - STFC)

Globus
Automating Research Data Workflows
Vas Vasiliadis
vas@uchicago.edu
STFC – January 10, 2019
Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
2
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
Staging data with compute jobs
• Stage data in or out as part of the job
• Transfer task is submitted when the job is run
– Endpoint may not be currently activated
• Alternative approaches
1. User adds directives to job submission script
2. Application manages data staging on user’s behalf
Application driven automation
• Application (e.g. portal, science gateway) submits a
transfer of compute results as the user
• Application monitors transfer, and initiates additional
processing and/or backup of data
Relevant Platform
Capabilities
Globus Auth: Native apps
• Client that cannot keep a secret, e.g…
– Command line, desktop apps
– Mobile apps
– Jupyter notebooks
• Native app is registered with Globus Auth
– Not a confidential client
• Native App Grant is used
– Variation on the Authorization Code Grant
• Globus SDK:
– To get tokens: NativeAppAuthClient
– To use tokens: AccessTokenAuthorizer
6
Refresh tokens
• Common use cases
– Portal checking transfer status when user is not logged in
– Running command line app from script
• Refresh tokens issued to client, in particular scope
• Client uses refresh token to get access token
– Confidential client: client_id and client_secret required
– Native app: client_secret not required
• Refresh token good for 6 months after last use
• Consent rescindment revokes resource token
8
Refresh tokens
9
Native App
(Client)
App/Service
(Resource Server)
Globus Auth
(Authorization Server)
1. Run
application
2. URL to
authenticate
Browser
3. Authenticate and consent
4. Auth code
5. Register
auth code
6. Exchange code,
request refresh tokens
7. Access
tokens and refresh tokens
9. Exchange refresh token
for new access tokens
8. Store refresh tokens
10. Access tokens
11. Authenticate with access
tokens to invoke service as user
Native App/Refresh Tokens Sample Code
github.com/globus/native-app-examples
• ./example_copy_paste.py
– User copies and pastes code to the app
• ./example_copy_paste_refresh_token.py
– Stores refresh token locally, uses it to get new access tokens
• See README for installation
10
On your EC2 instance in ~globus/native-app-examples
Automation via the
Globus CLI
The Globus CLI
• Installation: docs.globus.org/cli/installation
• Logging On (remember the consents?)
– globus login / logout
• Getting help / list of commands
– globus --help
– globus list-commands
• Doing something
– It’s all about the UUIDs
– Don’t forget the file paths!
Globus Command Line Interface (CLI)
• Native application, with easy installation/update
• Open source, uses Python SDK
• globus login – get access and refresh tokens
– Tokens stored locally in ~/.globus.cfg
• Service (transfer/auth) invocation uses tokens
• globus logout – delete tokens
docs.globus.org/cli/examples
Also available on your EC2 instance
UUIDs everywhere
• UUIDs for endpoint, task, user identity, groups…
• Use search/list options
• get-identities for identity username to UUID
$ globus endpoint search 'Globus Tutorial'
$ globus task list
$ globus get-identities vas@globus.org 14bf3755-
6267-42f2-9e9c-ad324de4a1fb
Batch Transfers
• Transfer tasks have one source/destination, but can have
any number of files
• Provide input source-dest pairs via local file
• e.g. move files listed in files.txt from $ep1 to $ep2
$ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec
$ globus transfer $ep1:/share/godata/ $ep2:/~/ --
batch --label 'CLI Batch' < files.txt
Parsing CLI output
• Default output is text; for JSON output use --format json
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --
format json
• Extract specific attributes using --jmespath <expression>
$ globus endpoint search --filter-scope my-endpoints --
jmespath 'DATA[].[id, display_name]'
Managing notifications
• Turn off emails sent for tasks
• Useful when an application manages tasks for a user
• Disable notifications with the --notify option
--notify off (all notifications)
--notify succeeded|failed|inactive (select notifications)
Permission management
• Set and manage permissions on shared endpoint
• Requires access manager role
$ share=<shared_endpoint_UUID>
$ globus endpoint permission create --permissions r --
identity tuecke@globus.org $share:/NCARTest/
$ globus endpoint permission list $share
$ globus endpoint permission delete $share <perm_UUID>
Automation with CLI
• Script transfers data repeatedly via task manager/cron
– Interactions are as user: both for data access and to Globus
services
• CLI commands used in job submission script
– CLI is installed on head node
– User runs globus login; tokens stored in user’s home directory
– Tokens accessible when job runs and submits stage in/out tasks
– Use --skip-activation-check to submit task even if endpoint is
not activated at submit time
Automation with portals
• Portal needs to act as the user
• User grants “offline” access to the portal
– Portal gets and stores refresh tokens for each user
– Uses client id/secret + refresh tokens to get new access tokens
– Portal maintains state about transfers being managed (task id)
Useful submission options
• Safe resubmissions
– Applies to all tasks (transfer and delete)
– Get a task UUID and use it in submission
– $ globus task generate-submission-id
– Use --submission-id option in transfer command
• Task wait
– Useful for scripting conditional on transfer task status
Walkthrough
• Sample script that uses sync option to transfer files
github.com/globus/automation-
examples/blob/master/cli-sync.sh
• Same task, via an app that uses Python SDK
github.com/globus/automation-
examples/blob/master/globus_folder_sync.py
Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline
1 of 23

Recommended

Automating Research Data Workflows (GlobusWorld Tour - UCSD) by
Automating Research Data Workflows (GlobusWorld Tour - UCSD)Automating Research Data Workflows (GlobusWorld Tour - UCSD)
Automating Research Data Workflows (GlobusWorld Tour - UCSD)Globus
29 views26 slides
Automating Research Data Workflows (GlobusWorld Tour - Columbia University) by
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)Automating Research Data Workflows (GlobusWorld Tour - Columbia University)
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)Globus
120 views22 slides
Tutorial: Automating Research Data Workflows by
Tutorial: Automating Research Data WorkflowsTutorial: Automating Research Data Workflows
Tutorial: Automating Research Data WorkflowsGlobus
135 views23 slides
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich) by
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)Globus
231 views23 slides
Catalyst MVC by
Catalyst MVCCatalyst MVC
Catalyst MVCSheeju Alex
881 views31 slides
The Patterns of Distributed Logging and Containers by
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
24.9K views43 slides

More Related Content

What's hot

Distributed Eventing in OSGi by
Distributed Eventing in OSGiDistributed Eventing in OSGi
Distributed Eventing in OSGiCarsten Ziegeler
1.9K views38 slides
Dive into Fluentd plugin v0.12 by
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12N Masahiro
23.5K views72 slides
Data Analytics Service Company and Its Ruby Usage by
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
21K views51 slides
Embuk internals by
Embuk internalsEmbuk internals
Embuk internalsSadayuki Furuhashi
3.2K views9 slides
nginx + ansible로 점검모드 만들기 by
nginx + ansible로 점검모드 만들기nginx + ansible로 점검모드 만들기
nginx + ansible로 점검모드 만들기June Kim
217 views24 slides
Stephan Ewen - Running Flink Everywhere by
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
2.7K views28 slides

What's hot(20)

Dive into Fluentd plugin v0.12 by N Masahiro
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
N Masahiro23.5K views
Data Analytics Service Company and Its Ruby Usage by SATOSHI TAGOMORI
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI21K views
nginx + ansible로 점검모드 만들기 by June Kim
nginx + ansible로 점검모드 만들기nginx + ansible로 점검모드 만들기
nginx + ansible로 점검모드 만들기
June Kim217 views
Stephan Ewen - Running Flink Everywhere by Flink Forward
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
Flink Forward2.7K views
Robust Operations of Kafka Streams by confluent
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
confluent3.5K views
Globus Endpoint Administration (GlobusWorld Tour - STFC) by Globus
Globus Endpoint Administration (GlobusWorld Tour - STFC)Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus 230 views
Managing Infrastructure as Code by Allan Shone
Managing Infrastructure as CodeManaging Infrastructure as Code
Managing Infrastructure as Code
Allan Shone691 views
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016 by Metosin Oy
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Metosin Oy2.1K views
JRuby with Java Code in Data Processing World by SATOSHI TAGOMORI
JRuby with Java Code in Data Processing WorldJRuby with Java Code in Data Processing World
JRuby with Java Code in Data Processing World
SATOSHI TAGOMORI7.4K views
Say YES to Premature Optimizations by Maude Lemaire
Say YES to Premature OptimizationsSay YES to Premature Optimizations
Say YES to Premature Optimizations
Maude Lemaire142 views
Fluentd v1.0 in a nutshell by N Masahiro
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
N Masahiro16K views
Http programming in play by Knoldus Inc.
Http programming in playHttp programming in play
Http programming in play
Knoldus Inc.4.2K views

Similar to Automating Research Data Workflows (GlobusWorld Tour - STFC)

Automating Research Data Flows with Globus (CHPC 2019 - South Africa) by
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
156 views25 slides
Automating Research Data Flows with the Globus Command Line Interface (CLI) by
Automating Research Data Flows with the Globus Command Line Interface (CLI)Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)Globus
237 views23 slides
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK by
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobus
176 views32 slides
Using Globus to Streamline Research at Scale by
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleGlobus
30 views28 slides
Automating Research Data Flows and an Introduction to the Globus Platform by
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformGlobus
132 views42 slides
Automating Research Data Flows and Introduction to the Globus Platform by
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
50 views41 slides

Similar to Automating Research Data Workflows (GlobusWorld Tour - STFC)(20)

Automating Research Data Flows with Globus (CHPC 2019 - South Africa) by Globus
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Globus 156 views
Automating Research Data Flows with the Globus Command Line Interface (CLI) by Globus
Automating Research Data Flows with the Globus Command Line Interface (CLI)Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Globus 237 views
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK by Globus
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
Globus 176 views
Using Globus to Streamline Research at Scale by Globus
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at Scale
Globus 30 views
Automating Research Data Flows and an Introduction to the Globus Platform by Globus
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
Globus 132 views
Automating Research Data Flows and Introduction to the Globus Platform by Globus
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
Globus 50 views
Globus Command Line Interface (APS Workshop) by Globus
Globus Command Line Interface (APS Workshop)Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)
Globus 100 views
Simple Data Automation with Globus (GlobusWorld Tour West) by Globus
Simple Data Automation with Globus (GlobusWorld Tour West)Simple Data Automation with Globus (GlobusWorld Tour West)
Simple Data Automation with Globus (GlobusWorld Tour West)
Globus 101 views
Globus Automation by Globus
Globus AutomationGlobus Automation
Globus Automation
Globus 23 views
Globus Platform Overview by Globus
Globus Platform OverviewGlobus Platform Overview
Globus Platform Overview
Globus 260 views
Tutorial: Leveraging Globus in your Research Applications by Globus
Tutorial: Leveraging Globus in your Research ApplicationsTutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research Applications
Globus 292 views
Introduction to the Globus Platform (GlobusWorld Tour - UMich) by Globus
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Globus 136 views
Introduction to the Globus Platform (APS Workshop) by Globus
Introduction to the Globus Platform (APS Workshop)Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)
Globus 97 views
Data Publication and Discovery with Globus by Globus
Data Publication and Discovery with GlobusData Publication and Discovery with Globus
Data Publication and Discovery with Globus
Globus 266 views
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa) by Globus
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Globus 106 views
Automating Research Data Management with Globus by Globus
Automating Research Data Management with GlobusAutomating Research Data Management with Globus
Automating Research Data Management with Globus
Globus 250 views
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus by Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Globus 137 views
Automating Research Data with Globus Flows and Compute by Globus
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
Globus 6 views
Jupyter + Globus: The Foundation for Interactive Data Science by Globus
Jupyter + Globus: The Foundation for Interactive Data ScienceJupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data Science
Globus 423 views
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University) by Globus
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Globus 84 views

More from Globus

Introduction to Globus for System Administrators by
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
11 views55 slides
Introduction to Data Transfer and Sharing for Researchers by
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
4 views33 slides
Introduction to the Globus Platform for Developers by
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
4 views28 slides
Introduction to the Command Line Interface (CLI) by
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
13 views12 slides
Advanced Globus System Administration by
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
26 views29 slides
Introduction to Globus for System Administrators by
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
94 views54 slides

More from Globus (20)

Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 11 views
Introduction to Data Transfer and Sharing for Researchers by Globus
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
Globus 4 views
Introduction to the Globus Platform for Developers by Globus
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
Globus 4 views
Introduction to the Command Line Interface (CLI) by Globus
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
Globus 13 views
Advanced Globus System Administration by Globus
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
Globus 26 views
Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 94 views
Introduction to Globus for New Users by Globus
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
Globus 55 views
Working with Globus Platform Services and Portals by Globus
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
Globus 28 views
Advanced Globus System Administration by Globus
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
Globus 21 views
Introduction to Globus by Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
Globus 43 views
Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 27 views
Working with Globus Platform Services by Globus
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform Services
Globus 42 views
Advanced Globus System Administration by Globus
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
Globus 29 views
Introduction to Globus for System Administrators by Globus
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
Globus 145 views
Introduction to Globus for Researchers by Globus
Introduction to Globus for ResearchersIntroduction to Globus for Researchers
Introduction to Globus for Researchers
Globus 89 views
Introduction to Globus for New Users by Globus
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
Globus 58 views
Globus Endpoint Migration and Advanced Administration Topics by Globus
Globus Endpoint Migration and Advanced Administration TopicsGlobus Endpoint Migration and Advanced Administration Topics
Globus Endpoint Migration and Advanced Administration Topics
Globus 55 views
Globus for System Administrators by Globus
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
Globus 68 views
Building Data Portals and Science Gateways with Globus by Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with Globus
Globus 133 views
Moemoea nui Aotearoa: Challenges and Strategies in Data Lifecycle Management ... by Globus
Moemoea nui Aotearoa: Challenges and Strategies in Data Lifecycle Management ...Moemoea nui Aotearoa: Challenges and Strategies in Data Lifecycle Management ...
Moemoea nui Aotearoa: Challenges and Strategies in Data Lifecycle Management ...
Globus 151 views

Recently uploaded

Infomatica-MDM.pptx by
Infomatica-MDM.pptxInfomatica-MDM.pptx
Infomatica-MDM.pptxKapil Rangwani
11 views16 slides
shivam tiwari.pptx by
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptxAanyaMishra4
5 views14 slides
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptxayeshabaig2004
7 views30 slides
Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides
MOSORE_BRESCIA by
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 views8 slides
SAP-TCodes.pdf by
SAP-TCodes.pdfSAP-TCodes.pdf
SAP-TCodes.pdfmustafaghulam8181
10 views285 slides

Recently uploaded(20)

Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ... by DataScienceConferenc1
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx

Automating Research Data Workflows (GlobusWorld Tour - STFC)

  • 1. Automating Research Data Workflows Vas Vasiliadis vas@uchicago.edu STFC – January 10, 2019
  • 2. Data replication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 2 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  • 3. Staging data with compute jobs • Stage data in or out as part of the job • Transfer task is submitted when the job is run – Endpoint may not be currently activated • Alternative approaches 1. User adds directives to job submission script 2. Application manages data staging on user’s behalf
  • 4. Application driven automation • Application (e.g. portal, science gateway) submits a transfer of compute results as the user • Application monitors transfer, and initiates additional processing and/or backup of data
  • 6. Globus Auth: Native apps • Client that cannot keep a secret, e.g… – Command line, desktop apps – Mobile apps – Jupyter notebooks • Native app is registered with Globus Auth – Not a confidential client • Native App Grant is used – Variation on the Authorization Code Grant • Globus SDK: – To get tokens: NativeAppAuthClient – To use tokens: AccessTokenAuthorizer 6
  • 7. Refresh tokens • Common use cases – Portal checking transfer status when user is not logged in – Running command line app from script • Refresh tokens issued to client, in particular scope • Client uses refresh token to get access token – Confidential client: client_id and client_secret required – Native app: client_secret not required • Refresh token good for 6 months after last use • Consent rescindment revokes resource token 8
  • 8. Refresh tokens 9 Native App (Client) App/Service (Resource Server) Globus Auth (Authorization Server) 1. Run application 2. URL to authenticate Browser 3. Authenticate and consent 4. Auth code 5. Register auth code 6. Exchange code, request refresh tokens 7. Access tokens and refresh tokens 9. Exchange refresh token for new access tokens 8. Store refresh tokens 10. Access tokens 11. Authenticate with access tokens to invoke service as user
  • 9. Native App/Refresh Tokens Sample Code github.com/globus/native-app-examples • ./example_copy_paste.py – User copies and pastes code to the app • ./example_copy_paste_refresh_token.py – Stores refresh token locally, uses it to get new access tokens • See README for installation 10 On your EC2 instance in ~globus/native-app-examples
  • 11. The Globus CLI • Installation: docs.globus.org/cli/installation • Logging On (remember the consents?) – globus login / logout • Getting help / list of commands – globus --help – globus list-commands • Doing something – It’s all about the UUIDs – Don’t forget the file paths!
  • 12. Globus Command Line Interface (CLI) • Native application, with easy installation/update • Open source, uses Python SDK • globus login – get access and refresh tokens – Tokens stored locally in ~/.globus.cfg • Service (transfer/auth) invocation uses tokens • globus logout – delete tokens docs.globus.org/cli/examples Also available on your EC2 instance
  • 13. UUIDs everywhere • UUIDs for endpoint, task, user identity, groups… • Use search/list options • get-identities for identity username to UUID $ globus endpoint search 'Globus Tutorial' $ globus task list $ globus get-identities vas@globus.org 14bf3755- 6267-42f2-9e9c-ad324de4a1fb
  • 14. Batch Transfers • Transfer tasks have one source/destination, but can have any number of files • Provide input source-dest pairs via local file • e.g. move files listed in files.txt from $ep1 to $ep2 $ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec $ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec $ globus transfer $ep1:/share/godata/ $ep2:/~/ -- batch --label 'CLI Batch' < files.txt
  • 15. Parsing CLI output • Default output is text; for JSON output use --format json $ globus endpoint search --filter-scope my-endpoints $ globus endpoint search --filter-scope my-endpoints -- format json • Extract specific attributes using --jmespath <expression> $ globus endpoint search --filter-scope my-endpoints -- jmespath 'DATA[].[id, display_name]'
  • 16. Managing notifications • Turn off emails sent for tasks • Useful when an application manages tasks for a user • Disable notifications with the --notify option --notify off (all notifications) --notify succeeded|failed|inactive (select notifications)
  • 17. Permission management • Set and manage permissions on shared endpoint • Requires access manager role $ share=<shared_endpoint_UUID> $ globus endpoint permission create --permissions r -- identity tuecke@globus.org $share:/NCARTest/ $ globus endpoint permission list $share $ globus endpoint permission delete $share <perm_UUID>
  • 18. Automation with CLI • Script transfers data repeatedly via task manager/cron – Interactions are as user: both for data access and to Globus services • CLI commands used in job submission script – CLI is installed on head node – User runs globus login; tokens stored in user’s home directory – Tokens accessible when job runs and submits stage in/out tasks – Use --skip-activation-check to submit task even if endpoint is not activated at submit time
  • 19. Automation with portals • Portal needs to act as the user • User grants “offline” access to the portal – Portal gets and stores refresh tokens for each user – Uses client id/secret + refresh tokens to get new access tokens – Portal maintains state about transfers being managed (task id)
  • 20. Useful submission options • Safe resubmissions – Applies to all tasks (transfer and delete) – Get a task UUID and use it in submission – $ globus task generate-submission-id – Use --submission-id option in transfer command • Task wait – Useful for scripting conditional on transfer task status
  • 21. Walkthrough • Sample script that uses sync option to transfer files github.com/globus/automation- examples/blob/master/cli-sync.sh • Same task, via an app that uses Python SDK github.com/globus/automation- examples/blob/master/globus_folder_sync.py
  • 22. Support resources • Globus documentation: docs.globus.org • Sample code: github.com/globus • Helpdesk and issue escalation: support@globus.org • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  • 23. Join the Globus community • Access the service: globus.org/login • Create a personal endpoint: globus.org/app/endpoints/create-gcp • Documentation: docs.globus.org • Engage: globus.org/mailing-lists • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globusonline