SlideShare a Scribd company logo
1 of 60
Download to read offline
Automating Research Data with Globus
Flows and Compute
Greg Nawrocki
greg@globus.org
nawrocki@uchicago.edu
nawrocki@anl.gov
Washington University in St. Louis
September 20 & 21, 2022
Case Western Reserve University
October 23 – 24, 2023
Topics
• Globus Flows overview
• Automating data management
–Run an existing Flow
–Build a Flow then run it
• Globus Compute overview
• Automating end-to-end research flows
4
Globus Platform and Automation Capabilities
Timer Service
The Globus WebApp supports
recurring and scheduled transfers.
(a.k.a. Globus cron)
Command Line Interface
The CLI provides an interface to Globus
services from the shell and is suited to
both interactive and scripting use cases.
Globus API / SDK
Our open REST APIs and Python SDK
empower you to create an integrated
ecosystem of research data services
and applications. Harness the power of
the Globus platform so you can focus
on building your application.
Automation using Globus Flows
Available to all Globus Subscribers
• Managed, secure (Globus Auth), reliable
task orchestration
• Support for heterogenous resources
• Extensible and authorable event driven
execution model
– Flow Definition (JSON)
– Input Schema (JSON)
– Deployment
• Extensible via custom actions
6
Managed automation of tasks
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events * In development
Transfer
Transfer
raw files
Compute
Launch
analysis job
Carbon!
Correct,
classify, …
Compute
Extract
metadata
Search
Ingest to
index
Transfer
Move final
files to repo
Share
Set access
controls
Globus Flows service implementation
• Built on AWS Step Functions
– Simple state machine language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
AWS Step Functions Globus Auth
+
Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
Flow lifecycle
11
• Define using JSON/YAML
Flow lifecycle
12
• Define using JSON/YAML
• Deploy to Flows service
Flow lifecycle
13
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
Flow lifecycle
14
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
Flow lifecycle: Write once, run many
15
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
• …and run again!
Let’s take a look…
16
Globus-provided flows
17
A simple, rather contrived, use case
Transfer files
to intermediate
storage
Transfer
Actions
Transfer files
to final storage
Transfer
1 2
Ex. 1: Run an existing flow using the web app
• Navigate to app.globus.org/flows
• Find the flow named “Two Stage Globus Transfer” and click ”Start”
• Consent to allow the flow access to your account
• Source
– Collection: Globus Tutorial Endpoint 1
– Path: /share/godata/
• Intermediate
– Collection and path of your choice
– You can even use the collection you created yesterday in the admin tutorial
• Destination
– Collection: Globus Tutorial Endpoint 2
– Path: /~/
• Add appropriate labels and tags
• Start Run!
• Click “View Run Details” and “Event Log” to monitor progress
19
Let’s get real…
20
A simple, and very common, use case
Transfer raw
instrument
images
Transfer
Set access
controls for
sharing data
Share
1 2
Actions
Let’s build it!
22
jupyter.demo.globus.org
globus-jupyter-notebooks
Automation_Using_Globus_Flows.ipynb
https://globus-automate-
client.readthedocs.io/en/latest/authoring_flows.html
• Uses Globus defined Action Providers
• https://globus-automate-client.readthedocs.io/en/latest/globus_action_providers.html
• transfer
• Uses the Globus Transfer Task API to perform a transfer of data from one Globus
Collection to another.
• set_permission
• Uses the Globus Transfer ACL API to set or manage permissions on a folder or file.
Example Flow
Initial Housekeeping
import sys
import os
import time
import json
import uuid
import pickle
import base64
import globus_sdk
from globus_automate_client import FlowsClient
# ID of this tutorial notebook as registered with Globus Auth
CLIENT_ID = 'f794186b-f330-4595-b6c6-9c9d3e903e47’
• Things we need in place for this Notebook to run and access
the Globus SDK and Globus Flows client.
Initial Housekeeping
# Feel free to replace the collection UUIDs below with those of your own
collections
# "Globus Tutorial Endpoint 1"
source_collection = "ddb59aef-6d04-11e5-ba46-22000b92c6ec”
# "Globus Tutorials on ALCF Eagle"
destination_collection = "a6f165fa-aee2-4fe5-95f3-97429c28bf82”
# "Tutorial Users" group
my_collaborators = "50b6a29c-63ac-11e4-8062-22000ab68755”
Authentication and Authorization
• All interactions between users and services on the Globus
automation platform are governed by the Globus Auth service.
• Consent must be given by the user for each interaction taking place on their
behalf.
• When executing a flow.
• When deploying a new flow on the Globus Flow service.
• This Notebook in our JupyterHub.
• Access to the Flow service is already granted to you by virtue of authenticating to the
JupyterHub running this notebook – the tokens are already in place.
• If you're running this notebook in your own environment you will need to manually log
into Globus Auth and get tokens using a native app authorization flow (see the
`Platform_Introduction_Native_App_Auth` notebook for an example of how to initiate
this flow).
The Globus Flows Service in a Jupyter Notebook
login
REST APIs
{ “tokens”:…
{“tokens”:…
REST APIs
Flow Service
Bearer a45cd…
# Get Globus Auth token data from the JupyterHub environment
tokens = pickle.loads(base64.b64decode(os.getenv('GLOBUS_DATA')))['tokens']
# Introspect tokens
print(json.dumps(tokens, indent=2))
Authentication and Authorization
# Create a variable for storing flow scope tokens. Each newly deployed flow
# scope needs to be authorized separately,
# and will have its own set of tokens. Save each of these tokens by scope.
saved_flow_scopes = {}
# Add a callback to the flows client for fetching scopes.
# It will draw scopes from `saved_flow_scopes`
def get_flow_authorizer(flow_url, flow_scope, client_id):
return globus_sdk.AccessTokenAuthorizer 
(access_token=saved_flow_scopes[flow_scope]['access_token'])
# Setup the Flow client, using tokens from our Jupyterhub login to access the
Globus Flows service, and
# set the `get_flow_authorizer` callback for any new flows we authorize.
flows_authorizer = globus_sdk.AccessTokenAuthorizer 
(access_token=tokens['flows.globus.org']['access_token'])
flows_client = FlowsClient.new_client 
(CLIENT_ID, get_flow_authorizer, flows_authorizer)
• Once you’ve got the tokens the authentication magic happens.
Fetch User Identity
# Create an Auth client so we can look up identities
auth_authorizer = globus_sdk.AccessTokenAuthorizer 
(access_token=tokens['auth.globus.org']['access_token'])
ac = globus_sdk.AuthClient(authorizer=auth_authorizer)
# Get the user's primary identity
primary_identity = ac.oauth2_userinfo()
identity_id = primary_identity['sub']
print(f"Username: {primary_identity['preferred_username']} (ID: {identity_id})")
print(f"Notifications will be sent to: {primary_identity['email']}")
• When transferring files to the destination collection we will put them in a
uniquely named directory:
• <identity_id>-shared-files
• Fetch our user id for this purpose.
• Define a Flow
• Flows are composed of State Types
• The Action Type is what we will highlight in this example
• Define a Schema
• The user inputs needed for this Flow
• Deploy the Flow
– The FlowsClient makes that easy!
Authoring a Flow
# Define flow
flow_definition = {
"Comment": "Transfer files to a guest collection and set access permissions",
"StartAt": "TransferFiles",
"States": {
• Top Level Fields
• From the Amazon States Language playbook
• Can Include
• Comment
• StartAt
• First State in the Machine
• States
• State definitions
Authoring a Flow – Define a Flow
• Supported States from the Amazon States Language playbook
• Pass
• Passes input to output – performs no work
• Choice
• Adds branching logic to a state machine.
• Wait
• Delays the machine from continuing for a specified time.
• Fail
• Terminates the machine as a failed run.
• Globus Defined States
• Action
• References the Action Providers – The heart of our example.
• ExpressionEval
• Method of evaluating an expression to create parameter values for passing to an Action.
• Combines the Action and Pass State Types providing the ability to compute results for
Parameters (Action) and the simple storage of the new values (Pass).
• Useful for determining a value to be tested in a Choice State or to compute a “final” value
seen in the output of the Flow upon completion.
State Types
The Action State Type – by way of example
"TransferFiles":
"Comment": "Transfer to a guest collection",
"Type": "Action",
"ActionUrl": "https://actions.automate.globus.org/transfer/transfer",
• Name the State – “TransferFiles”
• Comment
• Self explanatory
• Type : Action (required)
• ActionUrl (required)
• The base URL of the Action (Service Endpoint). As defined by the Action Interface.
The Action State Type – by way of example
"Parameters": {
"source_endpoint_id.$": "$.input.source.id",
"destination_endpoint_id.$": "$.input.destination.id",
"transfer_items": [
{
"source_path.$": "$.input.source.path",
"destination_path.$": "$.input.destination.path",
"recursive.$": "$.input.recursive_tx"
}
]
},
• Each Action Provider (optionally) defines its own set of properties/inputs.
• Input to the Action can either be referenced by “InputPath” or
“Parameters”.
• In this example the parameters are referenced from the input schema
(we’ll see that soon).
The Action State Type – by way of example
"ResultPath": "$.TransferFiles",
"WaitTime": 60,
"Next": "SetPermission",
},
• “ResultPath”: Is a Reference Path indicating where the output of the Action will
be placed in the state of the Flow run-time.
• “WaitTime” (optional, default value 300 – five minutes): The maximum amount
time to wait for the Action to complete (or abort) in seconds.
• “Next or End” (mutually exclusive, one required): These indicate how the Flow
should proceed after the Action state.
– “Next ”indicates the name of the following state of the flow.
– “End” with a value ”True” indicates that the Flow is complete after this state completes.
The Action State Type – another example
"SetPermission": {
"Comment": "Grant read permission on the data to a Globus user or group",
"Type": "Action",
"ActionUrl": "https://actions.automate.globus.org/transfer/set_permission",
"Parameters": {
"endpoint_id.$": "$.input.destination.id",
"path.$": "$.input.destination.path",
"operation": "CREATE",
"permissions": "r", # read-only access
"principal_type.$": "$.input.principal_type", # 'group' or 'identity'
"principal.$": "$.input.principal_identifier"
},
"ResultPath": "$.SetPermission",
"End": True
}
}
}
The Action State Type – wrap up
• The examples above are not exhaustive – for more
information on the Action State Type
• https://globus-automate-
client.readthedocs.io/en/latest/authoring_flows.html#action-state-type
• Globus Action Providers
• https://globus-automate-
client.readthedocs.io/en/latest/globus_action_providers.html
• Roll your own Action Providers
• https://action-provider-tools.readthedocs.io/en/latest/
• All Flows require schemas to validate user input.
• Yea! More JSON!
Authoring a Flow – Define a Schema
# Define input schema
input_schema = {
"required": [
"input"
],
"properties": {
"input": {
"type": "object",
"required": [
"source",
"destination",
"recursive_tx",
"principal_identifier",
"principal_type"
],
• User input we need for this Flow
– source
o Globus Collection containing the
source data
– destination
o Globus Guest collection that will be
the destination of the transfer action
– recursive_tx
o Boolean flag to state whether or not
to transfer files recursively
– principal_identifier
o UUID of the user identity or group to
share data with
– principal_type
o Specifies whether to share with an
individual user of group identity
Authoring a Flow – Define a Schema
"properties": {
"source": {
"type": "object",
"title": "Select source collection and path",
"description": "The source collection and path (path MUST end with a slash)",
"format": "globus-collection",
"required": [
"id",
"path"
],
"properties": {
"id": {
"type": "string",
"format": "uuid",
"default": source_collection
},
"path": {
"type": "string"
}
},
"additionalProperties": False
},
• Schema for the “source” object
– globus-collection is a custom
format
o https://globus-automate-
client.readthedocs.io/en/latest/authori
ng_flows.html#globus-web-app-
custom-formats
– Note the default to
source_collection which we
defined at the beginning of this
notebook.
Authoring a Flow – Define a Schema
”destination": {
"type": "object",
"title": "Select destination collection and path",
etc…
"recursive_tx": {
"type": "boolean",
"title": "Recursive transfer",
etc…
"principal_type": {
"type": "string",
"title": "Type of principal to share with",
etc…
"principal_identifier": {
"type": "string",
"title": "UUID of user identity or group",
etc…
• Finish Schema for remaining user input parameters
Flow Deployment
# Deploy the flow
flow_title = f"Tutorial-Transfer-Share-{str(uuid.uuid4())}" # generate a unique title
flow = flows_client.deploy_flow(
flow_definition,
title=flow_title,
input_schema=input_schema,
)
flow_id = flow['id']
flow_scope = flow['globus_auth_scope’]
print(f"Successfully deployed flow (ID: {flow_id})")
print(f"Flow scope: {flow_scope}nn")
print(f"View the flow in the Webapp: https://app.globus.org/flows/{flow_id}")
print(f"Note: You can start your flow directly from the Webapp!")
• Simple method of the FlowsClient
Go run it!
42
Flow Updating
flow = flows_client.update_flow(
flow_id,
flow_definition,
# administered_by=[f"urn:globus:auth:identity:{identity_id}"])
# runnable_by=[f"urn:globus:auth:identity:{identity_id}"])
visible_to=[f"urn:globus:auth:identity:{identity_id}"])
• If you change the Flow you will need to update it.
• Very similar to the deploy step.
• By default Flows are only visible to their creator, you can modify that here.
• https://globus-automate-
client.readthedocs.io/en/latest/python_sdk_reference.html
Flow Execution – From the API
• Flows may be run via the globus-automate API
– See section C of the Jupyter Notebook
• Authorize the Flow
– Native App Grant process
• Define the Flow Inputs
– Define Flow inputs with a JSON document
• Run the Flow
– Trivial thanks again to the FlowsClient
Flows - Administrivia
• Flows can be created / updated / run from the Globus CLI
• Flows is a subscription service
– Non-subscribers can have a single flow
– You should delete the flow we just created if you want to follow
along with the next example.
• If my institution has a subscription, how do I run more than
one flow?
– Short answer, contact me (greg@globus.org) or the Globus
Support Staff (support@globus.org)
– This process will improve
Now we’ll add computation to our flow
Transfer raw
instrument
images
Run a compute
job to process
image files
Transfer Compute
Move processed
images to
repository
Set access
controls for
sharing data
Share
Transfer
1 2 3 4
Globus Compute – Formerly FuncX
47
Globus Compute
• Globus Compute: managed, federated FaaS
• Compute function: Python code registered with the
Globus compute service ! simple image processing
• Compute endpoint: any system running the Globus
Compute agent ! our EC2 instance
• Currently you can only run functions you register
48
Globus Compute transforms any computing
resource into a function serving endpoint
• pip installable endpoint
– Globus Auth for registration
• Elastic resource provisioning
from local, cluster, or cloud
system (via Parsl)
• Parallel execution using local
fork or via common schedulers
– Slurm, PBS, LSF, Cobalt, K8s
49
Compute
Service
Web interface to Compute
50
List of compute endpoints available to user
Status and details of compute endpoint
Compute service will evolve rapidly
• Multi-user compute endpoints
• Native integration with transfer for stage in and stage
out of data for compute tasks
• Expanding compute service interfaces in the webapp
for administrators and users
ALCF
EC2
Instance
Computing
Resource
Our environment
Compute
Endpoint
Registered
Compute
Function
GCS
Endpoint
GCP
Endpoint
Compute
Service
Transfer
Service
Sharing Resource
Configure our computing resource
• Register a compute endpoint with Globus Compute
– Activate venv: ~/.compute/bin/activate
o Virtual environment – contains necessary packages
– Register: globus-compute-endpoint configure EP_NAME
– Start: globus-compute-endpoint start EP_NAME
– Save the registered endpoint UUID
– View endpoint in the web app: app.globus.org/compute
53
Register and execute a function
• Register a function with Globus Compute
– Activate venv: ~/.compute/bin/activate (should
already be done)
– Register: python ~/globus-flows-trigger-
examples/compute_function.py
– Save the registered function UUID
• Open interactive Python shell and run the function
54
Configure our computing resource storage
• We need a way to get the data to that computing
resource
• Setup and run Globus Connect Personal
– Setup: globusconnectpersonal
– Run: globusconnectpersonal -start &
– Save the registered collection UUID
– View collection in the web app
55
Adding computation to our flow
Transfer raw
instrument
images
Run a compute
job to process
image files
Transfer Compute
Move processed
images to
repository
Set access
controls for
sharing data
Share
Transfer
1 2 3 4
EC2
Instance
Computing
Resource
Our environment
Compute
Endpoint
Registered
Compute
Function
transfer control
ALCF
Sharing Resource
transfer
raw files
1
invoke image
processing function 2
set
permissions
4
transfer
result files
3
GCP
Endpoint
GCS
Endpoint
Compute
Service
Transfer
Service
Instrument
(same EC2 Instance)
GCP
Endpoint
Monitor
script
0
trigger
flow run
Incorporate compute into a flow (1/3)
• Review the flow definition and schema:
– transfer_compute_share_definition.json
– Actually… we’ll do that after we deploy it
• Deploy the enhanced flow
– Activate venv: ~/.trigger/bin/activate
– Deploy: deploy_flow --flowdef --schema --title
58
Incorporate compute into a flow (2/3)
• Update the monitoring script
– Edit trigger_transfer_compute_share.py
• Modify…
– Flow ID
– Source collection ID and path (the “instrument”)
– Destination collection ID and path (the compute endpoint)
– Compute endpoint and function IDs
– Result share collection ID and path (the sharing resource)
59
Incorporate compute into a flow (3/3)
• Run the monitoring script
./trigger_transfer_compute_share_flow.py 
--watchdir /home/devN/images 
--patterns .done
• Activate the trigger
– Copy *.png files to directory being monitored
– “touch” iam.done file to trigger the flow
• Monitor the running flow in the web app
60
EC2
Instance
Computing
Resource
Enjoy our success!
Compute
Endpoint
Registered
Compute
Function
transfer control
access
result files
ALCF
Sharing Resource
transfer
raw files
1
invoke image
processing function 2
set
permissions
4
transfer
result files
3
GCP
Endpoint
GCS
Endpoint
Compute
Service
Transfer
Service
Instrument
(same EC2 Instance)
GCP
Endpoint
Monitor
script
0
trigger
flow run
Extending the ecosystem: Action providers
62
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-tools.readthedocs.io
compute
ACLs
delete
identifier
transfer
notify ingest
mkdir
search
ls
Xtract describe
web form
Custom developed
docs.globus.org/api/flows/hosted-action-providers
Flows Resources
• Globus Documentation: docs.globus.org
• Flows Specific Doc: https://docs.globus.org/api/flows/
–Globus Flows Overview
o Authoring Flows
o Running a Flow automatically
o Python SDK Reference
–Globus Operated Action Providers
–Globus Action Provider API Specification
–Globus Flows API Specification

More Related Content

Similar to Automating Research Data with Globus Flows and Compute

Similar to Automating Research Data with Globus Flows and Compute (20)

Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
 
Getting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLGetting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQL
 
Introduction to Vert.x
Introduction to Vert.xIntroduction to Vert.x
Introduction to Vert.x
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
 
DevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless ArchitectureDevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless Architecture
 
Using Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at Scale
 
DevOps with Elastic Beanstalk - TCCC-2014
DevOps with Elastic Beanstalk - TCCC-2014DevOps with Elastic Beanstalk - TCCC-2014
DevOps with Elastic Beanstalk - TCCC-2014
 
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 

More from Globus

Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
Globus
 

More from Globus (20)

The Department of Energy's Integrated Research Infrastructure (IRI).pdf
The Department of Energy's Integrated Research Infrastructure (IRI).pdfThe Department of Energy's Integrated Research Infrastructure (IRI).pdf
The Department of Energy's Integrated Research Infrastructure (IRI).pdf
 
Research Automation with Globus Flows.pptx
Research Automation with Globus Flows.pptxResearch Automation with Globus Flows.pptx
Research Automation with Globus Flows.pptx
 
Reactive Documents and Computational Pipelines
Reactive Documents and Computational PipelinesReactive Documents and Computational Pipelines
Reactive Documents and Computational Pipelines
 
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
GlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote AddressGlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote Address
 
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use CasesGlobus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
 
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) WorkflowsGlobus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
 
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
 
Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)
 
Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
 
Enhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptxEnhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptx
 
Enhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdfEnhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
 
Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Recently uploaded (20)

AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 

Automating Research Data with Globus Flows and Compute

  • 1. Automating Research Data with Globus Flows and Compute Greg Nawrocki greg@globus.org nawrocki@uchicago.edu nawrocki@anl.gov Washington University in St. Louis September 20 & 21, 2022 Case Western Reserve University October 23 – 24, 2023
  • 2. Topics • Globus Flows overview • Automating data management –Run an existing Flow –Build a Flow then run it • Globus Compute overview • Automating end-to-end research flows 4
  • 3. Globus Platform and Automation Capabilities Timer Service The Globus WebApp supports recurring and scheduled transfers. (a.k.a. Globus cron) Command Line Interface The CLI provides an interface to Globus services from the shell and is suited to both interactive and scripting use cases. Globus API / SDK Our open REST APIs and Python SDK empower you to create an integrated ecosystem of research data services and applications. Harness the power of the Globus platform so you can focus on building your application.
  • 4. Automation using Globus Flows Available to all Globus Subscribers • Managed, secure (Globus Auth), reliable task orchestration • Support for heterogenous resources • Extensible and authorable event driven execution model – Flow Definition (JSON) – Input Schema (JSON) – Deployment • Extensible via custom actions 6
  • 5. Managed automation of tasks • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development Transfer Transfer raw files Compute Launch analysis job Carbon! Correct, classify, … Compute Extract metadata Search Ingest to index Transfer Move final files to repo Share Set access controls
  • 6. Globus Flows service implementation • Built on AWS Step Functions – Simple state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth AWS Step Functions Globus Auth +
  • 7. Automation services ecosystem GET /provider_url/ POST /provider_url/run GET /provider_url/action_id/status GET /provider_url/action_id/cancel GET /provider_url/action_id/status Create Action Providers Define and deploy flows { “StartAt”: ”ToProject”, ”States” : { ”ToProject” : { … }, ”SetPermission” : { …}, “ProcessData” : { … } … }} Run flows
  • 9. Flow lifecycle 12 • Define using JSON/YAML • Deploy to Flows service
  • 10. Flow lifecycle 13 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution
  • 11. Flow lifecycle 14 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor
  • 12. Flow lifecycle: Write once, run many 15 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor • …and run again!
  • 13. Let’s take a look… 16
  • 15. A simple, rather contrived, use case Transfer files to intermediate storage Transfer Actions Transfer files to final storage Transfer 1 2
  • 16. Ex. 1: Run an existing flow using the web app • Navigate to app.globus.org/flows • Find the flow named “Two Stage Globus Transfer” and click ”Start” • Consent to allow the flow access to your account • Source – Collection: Globus Tutorial Endpoint 1 – Path: /share/godata/ • Intermediate – Collection and path of your choice – You can even use the collection you created yesterday in the admin tutorial • Destination – Collection: Globus Tutorial Endpoint 2 – Path: /~/ • Add appropriate labels and tags • Start Run! • Click “View Run Details” and “Event Log” to monitor progress 19
  • 18. A simple, and very common, use case Transfer raw instrument images Transfer Set access controls for sharing data Share 1 2 Actions
  • 20. • Uses Globus defined Action Providers • https://globus-automate-client.readthedocs.io/en/latest/globus_action_providers.html • transfer • Uses the Globus Transfer Task API to perform a transfer of data from one Globus Collection to another. • set_permission • Uses the Globus Transfer ACL API to set or manage permissions on a folder or file. Example Flow
  • 21. Initial Housekeeping import sys import os import time import json import uuid import pickle import base64 import globus_sdk from globus_automate_client import FlowsClient # ID of this tutorial notebook as registered with Globus Auth CLIENT_ID = 'f794186b-f330-4595-b6c6-9c9d3e903e47’ • Things we need in place for this Notebook to run and access the Globus SDK and Globus Flows client.
  • 22. Initial Housekeeping # Feel free to replace the collection UUIDs below with those of your own collections # "Globus Tutorial Endpoint 1" source_collection = "ddb59aef-6d04-11e5-ba46-22000b92c6ec” # "Globus Tutorials on ALCF Eagle" destination_collection = "a6f165fa-aee2-4fe5-95f3-97429c28bf82” # "Tutorial Users" group my_collaborators = "50b6a29c-63ac-11e4-8062-22000ab68755”
  • 23. Authentication and Authorization • All interactions between users and services on the Globus automation platform are governed by the Globus Auth service. • Consent must be given by the user for each interaction taking place on their behalf. • When executing a flow. • When deploying a new flow on the Globus Flow service. • This Notebook in our JupyterHub. • Access to the Flow service is already granted to you by virtue of authenticating to the JupyterHub running this notebook – the tokens are already in place. • If you're running this notebook in your own environment you will need to manually log into Globus Auth and get tokens using a native app authorization flow (see the `Platform_Introduction_Native_App_Auth` notebook for an example of how to initiate this flow).
  • 24. The Globus Flows Service in a Jupyter Notebook login REST APIs { “tokens”:… {“tokens”:… REST APIs Flow Service Bearer a45cd… # Get Globus Auth token data from the JupyterHub environment tokens = pickle.loads(base64.b64decode(os.getenv('GLOBUS_DATA')))['tokens'] # Introspect tokens print(json.dumps(tokens, indent=2))
  • 25. Authentication and Authorization # Create a variable for storing flow scope tokens. Each newly deployed flow # scope needs to be authorized separately, # and will have its own set of tokens. Save each of these tokens by scope. saved_flow_scopes = {} # Add a callback to the flows client for fetching scopes. # It will draw scopes from `saved_flow_scopes` def get_flow_authorizer(flow_url, flow_scope, client_id): return globus_sdk.AccessTokenAuthorizer (access_token=saved_flow_scopes[flow_scope]['access_token']) # Setup the Flow client, using tokens from our Jupyterhub login to access the Globus Flows service, and # set the `get_flow_authorizer` callback for any new flows we authorize. flows_authorizer = globus_sdk.AccessTokenAuthorizer (access_token=tokens['flows.globus.org']['access_token']) flows_client = FlowsClient.new_client (CLIENT_ID, get_flow_authorizer, flows_authorizer) • Once you’ve got the tokens the authentication magic happens.
  • 26. Fetch User Identity # Create an Auth client so we can look up identities auth_authorizer = globus_sdk.AccessTokenAuthorizer (access_token=tokens['auth.globus.org']['access_token']) ac = globus_sdk.AuthClient(authorizer=auth_authorizer) # Get the user's primary identity primary_identity = ac.oauth2_userinfo() identity_id = primary_identity['sub'] print(f"Username: {primary_identity['preferred_username']} (ID: {identity_id})") print(f"Notifications will be sent to: {primary_identity['email']}") • When transferring files to the destination collection we will put them in a uniquely named directory: • <identity_id>-shared-files • Fetch our user id for this purpose.
  • 27. • Define a Flow • Flows are composed of State Types • The Action Type is what we will highlight in this example • Define a Schema • The user inputs needed for this Flow • Deploy the Flow – The FlowsClient makes that easy! Authoring a Flow
  • 28. # Define flow flow_definition = { "Comment": "Transfer files to a guest collection and set access permissions", "StartAt": "TransferFiles", "States": { • Top Level Fields • From the Amazon States Language playbook • Can Include • Comment • StartAt • First State in the Machine • States • State definitions Authoring a Flow – Define a Flow
  • 29. • Supported States from the Amazon States Language playbook • Pass • Passes input to output – performs no work • Choice • Adds branching logic to a state machine. • Wait • Delays the machine from continuing for a specified time. • Fail • Terminates the machine as a failed run. • Globus Defined States • Action • References the Action Providers – The heart of our example. • ExpressionEval • Method of evaluating an expression to create parameter values for passing to an Action. • Combines the Action and Pass State Types providing the ability to compute results for Parameters (Action) and the simple storage of the new values (Pass). • Useful for determining a value to be tested in a Choice State or to compute a “final” value seen in the output of the Flow upon completion. State Types
  • 30. The Action State Type – by way of example "TransferFiles": "Comment": "Transfer to a guest collection", "Type": "Action", "ActionUrl": "https://actions.automate.globus.org/transfer/transfer", • Name the State – “TransferFiles” • Comment • Self explanatory • Type : Action (required) • ActionUrl (required) • The base URL of the Action (Service Endpoint). As defined by the Action Interface.
  • 31. The Action State Type – by way of example "Parameters": { "source_endpoint_id.$": "$.input.source.id", "destination_endpoint_id.$": "$.input.destination.id", "transfer_items": [ { "source_path.$": "$.input.source.path", "destination_path.$": "$.input.destination.path", "recursive.$": "$.input.recursive_tx" } ] }, • Each Action Provider (optionally) defines its own set of properties/inputs. • Input to the Action can either be referenced by “InputPath” or “Parameters”. • In this example the parameters are referenced from the input schema (we’ll see that soon).
  • 32. The Action State Type – by way of example "ResultPath": "$.TransferFiles", "WaitTime": 60, "Next": "SetPermission", }, • “ResultPath”: Is a Reference Path indicating where the output of the Action will be placed in the state of the Flow run-time. • “WaitTime” (optional, default value 300 – five minutes): The maximum amount time to wait for the Action to complete (or abort) in seconds. • “Next or End” (mutually exclusive, one required): These indicate how the Flow should proceed after the Action state. – “Next ”indicates the name of the following state of the flow. – “End” with a value ”True” indicates that the Flow is complete after this state completes.
  • 33. The Action State Type – another example "SetPermission": { "Comment": "Grant read permission on the data to a Globus user or group", "Type": "Action", "ActionUrl": "https://actions.automate.globus.org/transfer/set_permission", "Parameters": { "endpoint_id.$": "$.input.destination.id", "path.$": "$.input.destination.path", "operation": "CREATE", "permissions": "r", # read-only access "principal_type.$": "$.input.principal_type", # 'group' or 'identity' "principal.$": "$.input.principal_identifier" }, "ResultPath": "$.SetPermission", "End": True } } }
  • 34. The Action State Type – wrap up • The examples above are not exhaustive – for more information on the Action State Type • https://globus-automate- client.readthedocs.io/en/latest/authoring_flows.html#action-state-type • Globus Action Providers • https://globus-automate- client.readthedocs.io/en/latest/globus_action_providers.html • Roll your own Action Providers • https://action-provider-tools.readthedocs.io/en/latest/
  • 35. • All Flows require schemas to validate user input. • Yea! More JSON! Authoring a Flow – Define a Schema # Define input schema input_schema = { "required": [ "input" ], "properties": { "input": { "type": "object", "required": [ "source", "destination", "recursive_tx", "principal_identifier", "principal_type" ], • User input we need for this Flow – source o Globus Collection containing the source data – destination o Globus Guest collection that will be the destination of the transfer action – recursive_tx o Boolean flag to state whether or not to transfer files recursively – principal_identifier o UUID of the user identity or group to share data with – principal_type o Specifies whether to share with an individual user of group identity
  • 36. Authoring a Flow – Define a Schema "properties": { "source": { "type": "object", "title": "Select source collection and path", "description": "The source collection and path (path MUST end with a slash)", "format": "globus-collection", "required": [ "id", "path" ], "properties": { "id": { "type": "string", "format": "uuid", "default": source_collection }, "path": { "type": "string" } }, "additionalProperties": False }, • Schema for the “source” object – globus-collection is a custom format o https://globus-automate- client.readthedocs.io/en/latest/authori ng_flows.html#globus-web-app- custom-formats – Note the default to source_collection which we defined at the beginning of this notebook.
  • 37. Authoring a Flow – Define a Schema ”destination": { "type": "object", "title": "Select destination collection and path", etc… "recursive_tx": { "type": "boolean", "title": "Recursive transfer", etc… "principal_type": { "type": "string", "title": "Type of principal to share with", etc… "principal_identifier": { "type": "string", "title": "UUID of user identity or group", etc… • Finish Schema for remaining user input parameters
  • 38. Flow Deployment # Deploy the flow flow_title = f"Tutorial-Transfer-Share-{str(uuid.uuid4())}" # generate a unique title flow = flows_client.deploy_flow( flow_definition, title=flow_title, input_schema=input_schema, ) flow_id = flow['id'] flow_scope = flow['globus_auth_scope’] print(f"Successfully deployed flow (ID: {flow_id})") print(f"Flow scope: {flow_scope}nn") print(f"View the flow in the Webapp: https://app.globus.org/flows/{flow_id}") print(f"Note: You can start your flow directly from the Webapp!") • Simple method of the FlowsClient
  • 40. Flow Updating flow = flows_client.update_flow( flow_id, flow_definition, # administered_by=[f"urn:globus:auth:identity:{identity_id}"]) # runnable_by=[f"urn:globus:auth:identity:{identity_id}"]) visible_to=[f"urn:globus:auth:identity:{identity_id}"]) • If you change the Flow you will need to update it. • Very similar to the deploy step. • By default Flows are only visible to their creator, you can modify that here. • https://globus-automate- client.readthedocs.io/en/latest/python_sdk_reference.html
  • 41. Flow Execution – From the API • Flows may be run via the globus-automate API – See section C of the Jupyter Notebook • Authorize the Flow – Native App Grant process • Define the Flow Inputs – Define Flow inputs with a JSON document • Run the Flow – Trivial thanks again to the FlowsClient
  • 42. Flows - Administrivia • Flows can be created / updated / run from the Globus CLI • Flows is a subscription service – Non-subscribers can have a single flow – You should delete the flow we just created if you want to follow along with the next example. • If my institution has a subscription, how do I run more than one flow? – Short answer, contact me (greg@globus.org) or the Globus Support Staff (support@globus.org) – This process will improve
  • 43. Now we’ll add computation to our flow Transfer raw instrument images Run a compute job to process image files Transfer Compute Move processed images to repository Set access controls for sharing data Share Transfer 1 2 3 4
  • 44. Globus Compute – Formerly FuncX 47
  • 45. Globus Compute • Globus Compute: managed, federated FaaS • Compute function: Python code registered with the Globus compute service ! simple image processing • Compute endpoint: any system running the Globus Compute agent ! our EC2 instance • Currently you can only run functions you register 48
  • 46. Globus Compute transforms any computing resource into a function serving endpoint • pip installable endpoint – Globus Auth for registration • Elastic resource provisioning from local, cluster, or cloud system (via Parsl) • Parallel execution using local fork or via common schedulers – Slurm, PBS, LSF, Cobalt, K8s 49 Compute Service
  • 47. Web interface to Compute 50 List of compute endpoints available to user Status and details of compute endpoint
  • 48. Compute service will evolve rapidly • Multi-user compute endpoints • Native integration with transfer for stage in and stage out of data for compute tasks • Expanding compute service interfaces in the webapp for administrators and users
  • 50. Configure our computing resource • Register a compute endpoint with Globus Compute – Activate venv: ~/.compute/bin/activate o Virtual environment – contains necessary packages – Register: globus-compute-endpoint configure EP_NAME – Start: globus-compute-endpoint start EP_NAME – Save the registered endpoint UUID – View endpoint in the web app: app.globus.org/compute 53
  • 51. Register and execute a function • Register a function with Globus Compute – Activate venv: ~/.compute/bin/activate (should already be done) – Register: python ~/globus-flows-trigger- examples/compute_function.py – Save the registered function UUID • Open interactive Python shell and run the function 54
  • 52. Configure our computing resource storage • We need a way to get the data to that computing resource • Setup and run Globus Connect Personal – Setup: globusconnectpersonal – Run: globusconnectpersonal -start & – Save the registered collection UUID – View collection in the web app 55
  • 53. Adding computation to our flow Transfer raw instrument images Run a compute job to process image files Transfer Compute Move processed images to repository Set access controls for sharing data Share Transfer 1 2 3 4
  • 54. EC2 Instance Computing Resource Our environment Compute Endpoint Registered Compute Function transfer control ALCF Sharing Resource transfer raw files 1 invoke image processing function 2 set permissions 4 transfer result files 3 GCP Endpoint GCS Endpoint Compute Service Transfer Service Instrument (same EC2 Instance) GCP Endpoint Monitor script 0 trigger flow run
  • 55. Incorporate compute into a flow (1/3) • Review the flow definition and schema: – transfer_compute_share_definition.json – Actually… we’ll do that after we deploy it • Deploy the enhanced flow – Activate venv: ~/.trigger/bin/activate – Deploy: deploy_flow --flowdef --schema --title 58
  • 56. Incorporate compute into a flow (2/3) • Update the monitoring script – Edit trigger_transfer_compute_share.py • Modify… – Flow ID – Source collection ID and path (the “instrument”) – Destination collection ID and path (the compute endpoint) – Compute endpoint and function IDs – Result share collection ID and path (the sharing resource) 59
  • 57. Incorporate compute into a flow (3/3) • Run the monitoring script ./trigger_transfer_compute_share_flow.py --watchdir /home/devN/images --patterns .done • Activate the trigger – Copy *.png files to directory being monitored – “touch” iam.done file to trigger the flow • Monitor the running flow in the web app 60
  • 58. EC2 Instance Computing Resource Enjoy our success! Compute Endpoint Registered Compute Function transfer control access result files ALCF Sharing Resource transfer raw files 1 invoke image processing function 2 set permissions 4 transfer result files 3 GCP Endpoint GCS Endpoint Compute Service Transfer Service Instrument (same EC2 Instance) GCP Endpoint Monitor script 0 trigger flow run
  • 59. Extending the ecosystem: Action providers 62 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider-tools.readthedocs.io compute ACLs delete identifier transfer notify ingest mkdir search ls Xtract describe web form Custom developed docs.globus.org/api/flows/hosted-action-providers
  • 60. Flows Resources • Globus Documentation: docs.globus.org • Flows Specific Doc: https://docs.globus.org/api/flows/ –Globus Flows Overview o Authoring Flows o Running a Flow automatically o Python SDK Reference –Globus Operated Action Providers –Globus Action Provider API Specification –Globus Flows API Specification