SlideShare a Scribd company logo
Automating Research Data Workflows
Vas Vasiliadis
vas@uchicago.edu
UCSD – May 8, 2019
Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
2
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
Staging data with compute jobs
• Stage data in or out as part of the job
• Transfer task is submitted when the job is run
– Endpoint may not be currently activated
• Alternative approaches
1. User adds directives to job submission script
2. Application manages data staging on user’s behalf
Application driven automation
• Application (e.g. portal, science gateway) submits a
transfer of compute results as the user
• Application monitors transfer, and initiates additional
processing and/or backup of data
Relevant Platform
Capabilities
Globus Auth: Native apps
• Client that cannot keep a secret, e.g…
– Command line, desktop apps
– Mobile apps
– Jupyter notebooks
• Native app is registered with Globus Auth
– Not a confidential client
• Native App Grant is used
– Variation on the Authorization Code Grant
• Globus SDK:
– To get tokens: NativeAppAuthClient
– To use tokens: AccessTokenAuthorizer
6
Browser
Native App grant
7
Native App
(Client)
1. Run
application
2. URL to
authenticate
3. Authenticate and
consent
4. Auth code
5. Register
auth code
6. Exchange
code
7. Access tokens
8. Authenticate with access
tokens to invoke transfer
service as user App/Service
(Resource Server)
Globus Auth
(Authorization Server)
Refresh tokens
• Common use cases
– Portal checking transfer status when user is not logged in
– Running command line app from script
• Refresh tokens issued to client, in particular scope
• Client uses refresh token to get access token
– Confidential client: client_id and client_secret required
– Native app: client_secret not required
• Refresh token good for 6 months after last use
• Consent rescindment revokes all tokens
8
Refresh tokens
9
Native App
(Client)
App/Service
(Resource Server)
Globus Auth
(Authorization Server)
1. Run
application
2. URL to
authenticate
Browser
3. Authenticate and consent
4. Auth code
5. Register
auth code
6. Exchange code,
request refresh tokens
7. Access
tokens and refresh tokens
9. Exchange refresh token
for new access tokens
8. Store refresh tokens
10. Access tokens
11. Authenticate with access
tokens to invoke service as user
Native App/Refresh Tokens Sample Code
github.com/globus/native-app-examples
• ./example_copy_paste.py
– User copies and pastes code to the app
• ./example_copy_paste_refresh_token.py
– Stores refresh token locally, uses it to get new access tokens
• See README for installation
10
On your EC2 instance in ~globus/native-app-examples
Automation via the
Globus CLI
Globus Command Line Interface (CLI)
• Native application, with easy installation/update
• Open source, uses Python SDK
• globus login – get access and refresh tokens
– Tokens stored locally in ~/.globus.cfg
• Service (transfer/auth) invocation uses tokens
• globus logout – delete tokens
docs.globus.org/cli/examples
Also available on your EC2 instance
The Globus CLI
• Installation: docs.globus.org/cli/installation
• Authenticating to the Globus service (consents!)
– globus login / logout
• Getting help
– globus --help
– globus list-commands
• Doing something
– It’s all about the UUIDs
– Don’t forget the file paths!
UUIDs everywhere
• UUIDs for endpoint, task, user identity, groups…
• Use search/list options
• get-identities for identity username to UUID
$ globus endpoint search 'Globus Tutorial'
$ globus task list
$ globus get-identities vas@globus.org 14bf3755-
6267-42f2-9e9c-ad324de4a1fb
Batch Transfers
• Transfer tasks have one source/destination, but can have
any number of files
• Provide input source-dest pairs via local file
• e.g. move files listed in files.txt from $ep1 to $ep2
$ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec
$ globus transfer $ep1:/share/godata/ $ep2:/~/ --
batch --label 'CLI Batch' < files.txt
Useful submission options
• Safe resubmissions
– Applies to all tasks (transfer and delete)
– Get a task UUID and use it in submission
– $ globus task generate-submission-id
– Use --submission-id option in transfer command
• Task wait
– Useful for scripting conditional on transfer task status
Parsing CLI output
• Default output is text; for JSON output use --format json
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --
format json
• Extract specific attributes using --jmespath <expression>
$ globus endpoint search --filter-scope my-endpoints --
jmespath 'DATA[].[id, display_name]'
Managing notifications
• Turn off emails sent for tasks
• Useful when an application manages tasks for a user
• Disable notifications with the --notify option
--notify off (all notifications)
--notify succeeded|failed|inactive (select notifications)
Permission management
• Set and manage permissions on shared endpoint
• Requires access manager role
$ share=<shared_endpoint_UUID>
$ globus endpoint permission create --permissions r --
identity tuecke@globus.org $share:/NCARTest/
$ globus endpoint permission list $share
$ globus endpoint permission delete $share <perm_UUID>
Example: Recurring transfers
• Submit CLI transfer(s) via task manager/cron
• Useful for periodic sync, backup
• Interactions are as user: both for data access and to
Globus services
Example: Job submission data staging
• CLI installed on head node
• User runs globus login; tokens stored in user’s
home directory
• Tokens accessible when job runs and submits stage
in/out tasks
• Use --skip-activation-check when submitting task
– Task accepted even if endpoint is not activated at submit time
– Task held until endpoint is activated
Example: Automation with portals
• Portal needs to act as the user
• User grants “offline” access to the portal
– Portal gets and stores refresh tokens for each user
– Uses client id/secret + refresh tokens to get new access tokens
– Portal maintains state about transfers being managed (task id)
Automation Examples
• Syncing a directory
– bash script; calls the Globus CLI
– Python module; run as script or import as module
• Staging data in a shared directory
– bash and Python variants
• Removing directories after files are transferred
– Python script
23
github.com/globus/automation-examples
Walkthrough
• Sample script that uses sync option to transfer files
github.com/globus/automation-
examples/blob/master/cli-sync.sh
• Same task, via an app that uses Python SDK
github.com/globus/automation-
examples/blob/master/globus_folder_sync.py
Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline

More Related Content

What's hot

Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
Doug Chang
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
Albert Chen
 

What's hot (20)

Elasticsearch features and ecosystem
Elasticsearch features and ecosystemElasticsearch features and ecosystem
Elasticsearch features and ecosystem
 
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
 
Monitoring infrastructure with prometheus
Monitoring infrastructure with prometheusMonitoring infrastructure with prometheus
Monitoring infrastructure with prometheus
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 
Archivematica Technical Training Diagnostics Guide (September 2018)
Archivematica Technical Training Diagnostics Guide (September 2018)Archivematica Technical Training Diagnostics Guide (September 2018)
Archivematica Technical Training Diagnostics Guide (September 2018)
 
How fluentd fits into the modern software landscape
How fluentd fits into the modern software landscapeHow fluentd fits into the modern software landscape
How fluentd fits into the modern software landscape
 
DevOps Odessa #TechTalks 21.01.2020
DevOps Odessa #TechTalks 21.01.2020DevOps Odessa #TechTalks 21.01.2020
DevOps Odessa #TechTalks 21.01.2020
 
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Flink Forward Berlin 2017: Patrick Lucas - Flink in ContainerlandFlink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
 
pgWALSync
pgWALSyncpgWALSync
pgWALSync
 
Don't Wait! Develop Responsive Applications with Java EE7 Instead
Don't Wait! Develop Responsive Applications with Java EE7 InsteadDon't Wait! Develop Responsive Applications with Java EE7 Instead
Don't Wait! Develop Responsive Applications with Java EE7 Instead
 
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud" Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 

Similar to Automating Research Data Workflows (GlobusWorld Tour - UCSD)

Similar to Automating Research Data Workflows (GlobusWorld Tour - UCSD) (20)

Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
 
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)
 
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
 
Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)
 
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
 
Using Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at Scale
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Simple Data Automation with Globus (GlobusWorld Tour West)
Simple Data Automation with Globus (GlobusWorld Tour West)Simple Data Automation with Globus (GlobusWorld Tour West)
Simple Data Automation with Globus (GlobusWorld Tour West)
 
Globus Platform Overview
Globus Platform OverviewGlobus Platform Overview
Globus Platform Overview
 
Data Publication and Discovery with Globus
Data Publication and Discovery with GlobusData Publication and Discovery with Globus
Data Publication and Discovery with Globus
 
Tutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research ApplicationsTutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research Applications
 
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
 
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
 
Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)
 
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
 
Jupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data ScienceJupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data Science
 
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 

More from Globus

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdfExtending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 

More from Globus (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdfExtending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
 
Globus at the United States Geological Survey
Globus at the United States Geological SurveyGlobus at the United States Geological Survey
Globus at the United States Geological Survey
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflows
 
Reactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the GapReactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the Gap
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 

Recently uploaded (20)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 

Automating Research Data Workflows (GlobusWorld Tour - UCSD)

  • 1. Automating Research Data Workflows Vas Vasiliadis vas@uchicago.edu UCSD – May 8, 2019
  • 2. Data replication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 2 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  • 3. Staging data with compute jobs • Stage data in or out as part of the job • Transfer task is submitted when the job is run – Endpoint may not be currently activated • Alternative approaches 1. User adds directives to job submission script 2. Application manages data staging on user’s behalf
  • 4. Application driven automation • Application (e.g. portal, science gateway) submits a transfer of compute results as the user • Application monitors transfer, and initiates additional processing and/or backup of data
  • 6. Globus Auth: Native apps • Client that cannot keep a secret, e.g… – Command line, desktop apps – Mobile apps – Jupyter notebooks • Native app is registered with Globus Auth – Not a confidential client • Native App Grant is used – Variation on the Authorization Code Grant • Globus SDK: – To get tokens: NativeAppAuthClient – To use tokens: AccessTokenAuthorizer 6
  • 7. Browser Native App grant 7 Native App (Client) 1. Run application 2. URL to authenticate 3. Authenticate and consent 4. Auth code 5. Register auth code 6. Exchange code 7. Access tokens 8. Authenticate with access tokens to invoke transfer service as user App/Service (Resource Server) Globus Auth (Authorization Server)
  • 8. Refresh tokens • Common use cases – Portal checking transfer status when user is not logged in – Running command line app from script • Refresh tokens issued to client, in particular scope • Client uses refresh token to get access token – Confidential client: client_id and client_secret required – Native app: client_secret not required • Refresh token good for 6 months after last use • Consent rescindment revokes all tokens 8
  • 9. Refresh tokens 9 Native App (Client) App/Service (Resource Server) Globus Auth (Authorization Server) 1. Run application 2. URL to authenticate Browser 3. Authenticate and consent 4. Auth code 5. Register auth code 6. Exchange code, request refresh tokens 7. Access tokens and refresh tokens 9. Exchange refresh token for new access tokens 8. Store refresh tokens 10. Access tokens 11. Authenticate with access tokens to invoke service as user
  • 10. Native App/Refresh Tokens Sample Code github.com/globus/native-app-examples • ./example_copy_paste.py – User copies and pastes code to the app • ./example_copy_paste_refresh_token.py – Stores refresh token locally, uses it to get new access tokens • See README for installation 10 On your EC2 instance in ~globus/native-app-examples
  • 12. Globus Command Line Interface (CLI) • Native application, with easy installation/update • Open source, uses Python SDK • globus login – get access and refresh tokens – Tokens stored locally in ~/.globus.cfg • Service (transfer/auth) invocation uses tokens • globus logout – delete tokens docs.globus.org/cli/examples Also available on your EC2 instance
  • 13. The Globus CLI • Installation: docs.globus.org/cli/installation • Authenticating to the Globus service (consents!) – globus login / logout • Getting help – globus --help – globus list-commands • Doing something – It’s all about the UUIDs – Don’t forget the file paths!
  • 14. UUIDs everywhere • UUIDs for endpoint, task, user identity, groups… • Use search/list options • get-identities for identity username to UUID $ globus endpoint search 'Globus Tutorial' $ globus task list $ globus get-identities vas@globus.org 14bf3755- 6267-42f2-9e9c-ad324de4a1fb
  • 15. Batch Transfers • Transfer tasks have one source/destination, but can have any number of files • Provide input source-dest pairs via local file • e.g. move files listed in files.txt from $ep1 to $ep2 $ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec $ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec $ globus transfer $ep1:/share/godata/ $ep2:/~/ -- batch --label 'CLI Batch' < files.txt
  • 16. Useful submission options • Safe resubmissions – Applies to all tasks (transfer and delete) – Get a task UUID and use it in submission – $ globus task generate-submission-id – Use --submission-id option in transfer command • Task wait – Useful for scripting conditional on transfer task status
  • 17. Parsing CLI output • Default output is text; for JSON output use --format json $ globus endpoint search --filter-scope my-endpoints $ globus endpoint search --filter-scope my-endpoints -- format json • Extract specific attributes using --jmespath <expression> $ globus endpoint search --filter-scope my-endpoints -- jmespath 'DATA[].[id, display_name]'
  • 18. Managing notifications • Turn off emails sent for tasks • Useful when an application manages tasks for a user • Disable notifications with the --notify option --notify off (all notifications) --notify succeeded|failed|inactive (select notifications)
  • 19. Permission management • Set and manage permissions on shared endpoint • Requires access manager role $ share=<shared_endpoint_UUID> $ globus endpoint permission create --permissions r -- identity tuecke@globus.org $share:/NCARTest/ $ globus endpoint permission list $share $ globus endpoint permission delete $share <perm_UUID>
  • 20. Example: Recurring transfers • Submit CLI transfer(s) via task manager/cron • Useful for periodic sync, backup • Interactions are as user: both for data access and to Globus services
  • 21. Example: Job submission data staging • CLI installed on head node • User runs globus login; tokens stored in user’s home directory • Tokens accessible when job runs and submits stage in/out tasks • Use --skip-activation-check when submitting task – Task accepted even if endpoint is not activated at submit time – Task held until endpoint is activated
  • 22. Example: Automation with portals • Portal needs to act as the user • User grants “offline” access to the portal – Portal gets and stores refresh tokens for each user – Uses client id/secret + refresh tokens to get new access tokens – Portal maintains state about transfers being managed (task id)
  • 23. Automation Examples • Syncing a directory – bash script; calls the Globus CLI – Python module; run as script or import as module • Staging data in a shared directory – bash and Python variants • Removing directories after files are transferred – Python script 23 github.com/globus/automation-examples
  • 24. Walkthrough • Sample script that uses sync option to transfer files github.com/globus/automation- examples/blob/master/cli-sync.sh • Same task, via an app that uses Python SDK github.com/globus/automation- examples/blob/master/globus_folder_sync.py
  • 25. Support resources • Globus documentation: docs.globus.org • Sample code: github.com/globus • Helpdesk and issue escalation: support@globus.org • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  • 26. Join the Globus community • Access the service: globus.org/login • Create a personal endpoint: globus.org/app/endpoints/create-gcp • Documentation: docs.globus.org • Engage: globus.org/mailing-lists • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globusonline