SlideShare a Scribd company logo
1 of 60
Download to read offline
Automating Research Data with Globus
Flows and Compute
Greg Nawrocki
greg@globus.org
nawrocki@uchicago.edu
nawrocki@anl.gov
Washington University in St. Louis
September 20 & 21, 2022
Case Western Reserve University
October 23 – 24, 2023
Topics
• Globus Flows overview
• Automating data management
–Run an existing Flow
–Build a Flow then run it
• Globus Compute overview
• Automating end-to-end research flows
4
Globus Platform and Automation Capabilities
Timer Service
The Globus WebApp supports
recurring and scheduled transfers.
(a.k.a. Globus cron)
Command Line Interface
The CLI provides an interface to Globus
services from the shell and is suited to
both interactive and scripting use cases.
Globus API / SDK
Our open REST APIs and Python SDK
empower you to create an integrated
ecosystem of research data services
and applications. Harness the power of
the Globus platform so you can focus
on building your application.
Automation using Globus Flows
Available to all Globus Subscribers
• Managed, secure (Globus Auth), reliable
task orchestration
• Support for heterogenous resources
• Extensible and authorable event driven
execution model
– Flow Definition (JSON)
– Input Schema (JSON)
– Deployment
• Extensible via custom actions
6
Managed automation of tasks
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events * In development
Transfer
Transfer
raw files
Compute
Launch
analysis job
Carbon!
Correct,
classify, …
Compute
Extract
metadata
Search
Ingest to
index
Transfer
Move final
files to repo
Share
Set access
controls
Globus Flows service implementation
• Built on AWS Step Functions
– Simple state machine language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
AWS Step Functions Globus Auth
+
Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
Flow lifecycle
11
• Define using JSON/YAML
Flow lifecycle
12
• Define using JSON/YAML
• Deploy to Flows service
Flow lifecycle
13
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
Flow lifecycle
14
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
Flow lifecycle: Write once, run many
15
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
• …and run again!
Let’s take a look…
16
Globus-provided flows
17
A simple, rather contrived, use case
Transfer files
to intermediate
storage
Transfer
Actions
Transfer files
to final storage
Transfer
1 2
Ex. 1: Run an existing flow using the web app
• Navigate to app.globus.org/flows
• Find the flow named “Two Stage Globus Transfer” and click ”Start”
• Consent to allow the flow access to your account
• Source
– Collection: Globus Tutorial Endpoint 1
– Path: /share/godata/
• Intermediate
– Collection and path of your choice
– You can even use the collection you created yesterday in the admin tutorial
• Destination
– Collection: Globus Tutorial Endpoint 2
– Path: /~/
• Add appropriate labels and tags
• Start Run!
• Click “View Run Details” and “Event Log” to monitor progress
19
Let’s get real…
20
A simple, and very common, use case
Transfer raw
instrument
images
Transfer
Set access
controls for
sharing data
Share
1 2
Actions
Let’s build it!
22
jupyter.demo.globus.org
globus-jupyter-notebooks
Automation_Using_Globus_Flows.ipynb
https://globus-automate-
client.readthedocs.io/en/latest/authoring_flows.html
• Uses Globus defined Action Providers
• https://globus-automate-client.readthedocs.io/en/latest/globus_action_providers.html
• transfer
• Uses the Globus Transfer Task API to perform a transfer of data from one Globus
Collection to another.
• set_permission
• Uses the Globus Transfer ACL API to set or manage permissions on a folder or file.
Example Flow
Initial Housekeeping
import sys
import os
import time
import json
import uuid
import pickle
import base64
import globus_sdk
from globus_automate_client import FlowsClient
# ID of this tutorial notebook as registered with Globus Auth
CLIENT_ID = 'f794186b-f330-4595-b6c6-9c9d3e903e47’
• Things we need in place for this Notebook to run and access
the Globus SDK and Globus Flows client.
Initial Housekeeping
# Feel free to replace the collection UUIDs below with those of your own
collections
# "Globus Tutorial Endpoint 1"
source_collection = "ddb59aef-6d04-11e5-ba46-22000b92c6ec”
# "Globus Tutorials on ALCF Eagle"
destination_collection = "a6f165fa-aee2-4fe5-95f3-97429c28bf82”
# "Tutorial Users" group
my_collaborators = "50b6a29c-63ac-11e4-8062-22000ab68755”
Authentication and Authorization
• All interactions between users and services on the Globus
automation platform are governed by the Globus Auth service.
• Consent must be given by the user for each interaction taking place on their
behalf.
• When executing a flow.
• When deploying a new flow on the Globus Flow service.
• This Notebook in our JupyterHub.
• Access to the Flow service is already granted to you by virtue of authenticating to the
JupyterHub running this notebook – the tokens are already in place.
• If you're running this notebook in your own environment you will need to manually log
into Globus Auth and get tokens using a native app authorization flow (see the
`Platform_Introduction_Native_App_Auth` notebook for an example of how to initiate
this flow).
The Globus Flows Service in a Jupyter Notebook
login
REST APIs
{ “tokens”:…
{“tokens”:…
REST APIs
Flow Service
Bearer a45cd…
# Get Globus Auth token data from the JupyterHub environment
tokens = pickle.loads(base64.b64decode(os.getenv('GLOBUS_DATA')))['tokens']
# Introspect tokens
print(json.dumps(tokens, indent=2))
Authentication and Authorization
# Create a variable for storing flow scope tokens. Each newly deployed flow
# scope needs to be authorized separately,
# and will have its own set of tokens. Save each of these tokens by scope.
saved_flow_scopes = {}
# Add a callback to the flows client for fetching scopes.
# It will draw scopes from `saved_flow_scopes`
def get_flow_authorizer(flow_url, flow_scope, client_id):
return globus_sdk.AccessTokenAuthorizer 
(access_token=saved_flow_scopes[flow_scope]['access_token'])
# Setup the Flow client, using tokens from our Jupyterhub login to access the
Globus Flows service, and
# set the `get_flow_authorizer` callback for any new flows we authorize.
flows_authorizer = globus_sdk.AccessTokenAuthorizer 
(access_token=tokens['flows.globus.org']['access_token'])
flows_client = FlowsClient.new_client 
(CLIENT_ID, get_flow_authorizer, flows_authorizer)
• Once you’ve got the tokens the authentication magic happens.
Fetch User Identity
# Create an Auth client so we can look up identities
auth_authorizer = globus_sdk.AccessTokenAuthorizer 
(access_token=tokens['auth.globus.org']['access_token'])
ac = globus_sdk.AuthClient(authorizer=auth_authorizer)
# Get the user's primary identity
primary_identity = ac.oauth2_userinfo()
identity_id = primary_identity['sub']
print(f"Username: {primary_identity['preferred_username']} (ID: {identity_id})")
print(f"Notifications will be sent to: {primary_identity['email']}")
• When transferring files to the destination collection we will put them in a
uniquely named directory:
• <identity_id>-shared-files
• Fetch our user id for this purpose.
• Define a Flow
• Flows are composed of State Types
• The Action Type is what we will highlight in this example
• Define a Schema
• The user inputs needed for this Flow
• Deploy the Flow
– The FlowsClient makes that easy!
Authoring a Flow
# Define flow
flow_definition = {
"Comment": "Transfer files to a guest collection and set access permissions",
"StartAt": "TransferFiles",
"States": {
• Top Level Fields
• From the Amazon States Language playbook
• Can Include
• Comment
• StartAt
• First State in the Machine
• States
• State definitions
Authoring a Flow – Define a Flow
• Supported States from the Amazon States Language playbook
• Pass
• Passes input to output – performs no work
• Choice
• Adds branching logic to a state machine.
• Wait
• Delays the machine from continuing for a specified time.
• Fail
• Terminates the machine as a failed run.
• Globus Defined States
• Action
• References the Action Providers – The heart of our example.
• ExpressionEval
• Method of evaluating an expression to create parameter values for passing to an Action.
• Combines the Action and Pass State Types providing the ability to compute results for
Parameters (Action) and the simple storage of the new values (Pass).
• Useful for determining a value to be tested in a Choice State or to compute a “final” value
seen in the output of the Flow upon completion.
State Types
The Action State Type – by way of example
"TransferFiles":
"Comment": "Transfer to a guest collection",
"Type": "Action",
"ActionUrl": "https://actions.automate.globus.org/transfer/transfer",
• Name the State – “TransferFiles”
• Comment
• Self explanatory
• Type : Action (required)
• ActionUrl (required)
• The base URL of the Action (Service Endpoint). As defined by the Action Interface.
The Action State Type – by way of example
"Parameters": {
"source_endpoint_id.$": "$.input.source.id",
"destination_endpoint_id.$": "$.input.destination.id",
"transfer_items": [
{
"source_path.$": "$.input.source.path",
"destination_path.$": "$.input.destination.path",
"recursive.$": "$.input.recursive_tx"
}
]
},
• Each Action Provider (optionally) defines its own set of properties/inputs.
• Input to the Action can either be referenced by “InputPath” or
“Parameters”.
• In this example the parameters are referenced from the input schema
(we’ll see that soon).
The Action State Type – by way of example
"ResultPath": "$.TransferFiles",
"WaitTime": 60,
"Next": "SetPermission",
},
• “ResultPath”: Is a Reference Path indicating where the output of the Action will
be placed in the state of the Flow run-time.
• “WaitTime” (optional, default value 300 – five minutes): The maximum amount
time to wait for the Action to complete (or abort) in seconds.
• “Next or End” (mutually exclusive, one required): These indicate how the Flow
should proceed after the Action state.
– “Next ”indicates the name of the following state of the flow.
– “End” with a value ”True” indicates that the Flow is complete after this state completes.
The Action State Type – another example
"SetPermission": {
"Comment": "Grant read permission on the data to a Globus user or group",
"Type": "Action",
"ActionUrl": "https://actions.automate.globus.org/transfer/set_permission",
"Parameters": {
"endpoint_id.$": "$.input.destination.id",
"path.$": "$.input.destination.path",
"operation": "CREATE",
"permissions": "r", # read-only access
"principal_type.$": "$.input.principal_type", # 'group' or 'identity'
"principal.$": "$.input.principal_identifier"
},
"ResultPath": "$.SetPermission",
"End": True
}
}
}
The Action State Type – wrap up
• The examples above are not exhaustive – for more
information on the Action State Type
• https://globus-automate-
client.readthedocs.io/en/latest/authoring_flows.html#action-state-type
• Globus Action Providers
• https://globus-automate-
client.readthedocs.io/en/latest/globus_action_providers.html
• Roll your own Action Providers
• https://action-provider-tools.readthedocs.io/en/latest/
• All Flows require schemas to validate user input.
• Yea! More JSON!
Authoring a Flow – Define a Schema
# Define input schema
input_schema = {
"required": [
"input"
],
"properties": {
"input": {
"type": "object",
"required": [
"source",
"destination",
"recursive_tx",
"principal_identifier",
"principal_type"
],
• User input we need for this Flow
– source
o Globus Collection containing the
source data
– destination
o Globus Guest collection that will be
the destination of the transfer action
– recursive_tx
o Boolean flag to state whether or not
to transfer files recursively
– principal_identifier
o UUID of the user identity or group to
share data with
– principal_type
o Specifies whether to share with an
individual user of group identity
Authoring a Flow – Define a Schema
"properties": {
"source": {
"type": "object",
"title": "Select source collection and path",
"description": "The source collection and path (path MUST end with a slash)",
"format": "globus-collection",
"required": [
"id",
"path"
],
"properties": {
"id": {
"type": "string",
"format": "uuid",
"default": source_collection
},
"path": {
"type": "string"
}
},
"additionalProperties": False
},
• Schema for the “source” object
– globus-collection is a custom
format
o https://globus-automate-
client.readthedocs.io/en/latest/authori
ng_flows.html#globus-web-app-
custom-formats
– Note the default to
source_collection which we
defined at the beginning of this
notebook.
Authoring a Flow – Define a Schema
”destination": {
"type": "object",
"title": "Select destination collection and path",
etc…
"recursive_tx": {
"type": "boolean",
"title": "Recursive transfer",
etc…
"principal_type": {
"type": "string",
"title": "Type of principal to share with",
etc…
"principal_identifier": {
"type": "string",
"title": "UUID of user identity or group",
etc…
• Finish Schema for remaining user input parameters
Flow Deployment
# Deploy the flow
flow_title = f"Tutorial-Transfer-Share-{str(uuid.uuid4())}" # generate a unique title
flow = flows_client.deploy_flow(
flow_definition,
title=flow_title,
input_schema=input_schema,
)
flow_id = flow['id']
flow_scope = flow['globus_auth_scope’]
print(f"Successfully deployed flow (ID: {flow_id})")
print(f"Flow scope: {flow_scope}nn")
print(f"View the flow in the Webapp: https://app.globus.org/flows/{flow_id}")
print(f"Note: You can start your flow directly from the Webapp!")
• Simple method of the FlowsClient
Go run it!
42
Flow Updating
flow = flows_client.update_flow(
flow_id,
flow_definition,
# administered_by=[f"urn:globus:auth:identity:{identity_id}"])
# runnable_by=[f"urn:globus:auth:identity:{identity_id}"])
visible_to=[f"urn:globus:auth:identity:{identity_id}"])
• If you change the Flow you will need to update it.
• Very similar to the deploy step.
• By default Flows are only visible to their creator, you can modify that here.
• https://globus-automate-
client.readthedocs.io/en/latest/python_sdk_reference.html
Flow Execution – From the API
• Flows may be run via the globus-automate API
– See section C of the Jupyter Notebook
• Authorize the Flow
– Native App Grant process
• Define the Flow Inputs
– Define Flow inputs with a JSON document
• Run the Flow
– Trivial thanks again to the FlowsClient
Flows - Administrivia
• Flows can be created / updated / run from the Globus CLI
• Flows is a subscription service
– Non-subscribers can have a single flow
– You should delete the flow we just created if you want to follow
along with the next example.
• If my institution has a subscription, how do I run more than
one flow?
– Short answer, contact me (greg@globus.org) or the Globus
Support Staff (support@globus.org)
– This process will improve
Now we’ll add computation to our flow
Transfer raw
instrument
images
Run a compute
job to process
image files
Transfer Compute
Move processed
images to
repository
Set access
controls for
sharing data
Share
Transfer
1 2 3 4
Globus Compute – Formerly FuncX
47
Globus Compute
• Globus Compute: managed, federated FaaS
• Compute function: Python code registered with the
Globus compute service ! simple image processing
• Compute endpoint: any system running the Globus
Compute agent ! our EC2 instance
• Currently you can only run functions you register
48
Globus Compute transforms any computing
resource into a function serving endpoint
• pip installable endpoint
– Globus Auth for registration
• Elastic resource provisioning
from local, cluster, or cloud
system (via Parsl)
• Parallel execution using local
fork or via common schedulers
– Slurm, PBS, LSF, Cobalt, K8s
49
Compute
Service
Web interface to Compute
50
List of compute endpoints available to user
Status and details of compute endpoint
Compute service will evolve rapidly
• Multi-user compute endpoints
• Native integration with transfer for stage in and stage
out of data for compute tasks
• Expanding compute service interfaces in the webapp
for administrators and users
ALCF
EC2
Instance
Computing
Resource
Our environment
Compute
Endpoint
Registered
Compute
Function
GCS
Endpoint
GCP
Endpoint
Compute
Service
Transfer
Service
Sharing Resource
Configure our computing resource
• Register a compute endpoint with Globus Compute
– Activate venv: ~/.compute/bin/activate
o Virtual environment – contains necessary packages
– Register: globus-compute-endpoint configure EP_NAME
– Start: globus-compute-endpoint start EP_NAME
– Save the registered endpoint UUID
– View endpoint in the web app: app.globus.org/compute
53
Register and execute a function
• Register a function with Globus Compute
– Activate venv: ~/.compute/bin/activate (should
already be done)
– Register: python ~/globus-flows-trigger-
examples/compute_function.py
– Save the registered function UUID
• Open interactive Python shell and run the function
54
Configure our computing resource storage
• We need a way to get the data to that computing
resource
• Setup and run Globus Connect Personal
– Setup: globusconnectpersonal
– Run: globusconnectpersonal -start &
– Save the registered collection UUID
– View collection in the web app
55
Adding computation to our flow
Transfer raw
instrument
images
Run a compute
job to process
image files
Transfer Compute
Move processed
images to
repository
Set access
controls for
sharing data
Share
Transfer
1 2 3 4
EC2
Instance
Computing
Resource
Our environment
Compute
Endpoint
Registered
Compute
Function
transfer control
ALCF
Sharing Resource
transfer
raw files
1
invoke image
processing function 2
set
permissions
4
transfer
result files
3
GCP
Endpoint
GCS
Endpoint
Compute
Service
Transfer
Service
Instrument
(same EC2 Instance)
GCP
Endpoint
Monitor
script
0
trigger
flow run
Incorporate compute into a flow (1/3)
• Review the flow definition and schema:
– transfer_compute_share_definition.json
– Actually… we’ll do that after we deploy it
• Deploy the enhanced flow
– Activate venv: ~/.trigger/bin/activate
– Deploy: deploy_flow --flowdef --schema --title
58
Incorporate compute into a flow (2/3)
• Update the monitoring script
– Edit trigger_transfer_compute_share.py
• Modify…
– Flow ID
– Source collection ID and path (the “instrument”)
– Destination collection ID and path (the compute endpoint)
– Compute endpoint and function IDs
– Result share collection ID and path (the sharing resource)
59
Incorporate compute into a flow (3/3)
• Run the monitoring script
./trigger_transfer_compute_share_flow.py 
--watchdir /home/devN/images 
--patterns .done
• Activate the trigger
– Copy *.png files to directory being monitored
– “touch” iam.done file to trigger the flow
• Monitor the running flow in the web app
60
EC2
Instance
Computing
Resource
Enjoy our success!
Compute
Endpoint
Registered
Compute
Function
transfer control
access
result files
ALCF
Sharing Resource
transfer
raw files
1
invoke image
processing function 2
set
permissions
4
transfer
result files
3
GCP
Endpoint
GCS
Endpoint
Compute
Service
Transfer
Service
Instrument
(same EC2 Instance)
GCP
Endpoint
Monitor
script
0
trigger
flow run
Extending the ecosystem: Action providers
62
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-tools.readthedocs.io
compute
ACLs
delete
identifier
transfer
notify ingest
mkdir
search
ls
Xtract describe
web form
Custom developed
docs.globus.org/api/flows/hosted-action-providers
Flows Resources
• Globus Documentation: docs.globus.org
• Flows Specific Doc: https://docs.globus.org/api/flows/
–Globus Flows Overview
o Authoring Flows
o Running a Flow automatically
o Python SDK Reference
–Globus Operated Action Providers
–Globus Action Provider API Specification
–Globus Flows API Specification

More Related Content

Similar to Automating Research Data with Globus Flows and Compute

Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKnagachika t
 
Getting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLGetting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLMorgan Dedmon
 
Introduction to Vert.x
Introduction to Vert.xIntroduction to Vert.x
Introduction to Vert.xYiguang Hu
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...Ivanti
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
 
DevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless ArchitectureDevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless ArchitectureMikhail Prudnikov
 
Using Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleGlobus
 
DevOps with Elastic Beanstalk - TCCC-2014
DevOps with Elastic Beanstalk - TCCC-2014DevOps with Elastic Beanstalk - TCCC-2014
DevOps with Elastic Beanstalk - TCCC-2014scolestock
 
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...Fastly
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileAmazon Web Services Japan
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff mfrancis
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)Globus
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 

Similar to Automating Research Data with Globus Flows and Compute (20)

Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
 
Getting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLGetting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQL
 
Introduction to Vert.x
Introduction to Vert.xIntroduction to Vert.x
Introduction to Vert.x
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
 
DevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless ArchitectureDevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless Architecture
 
Using Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at Scale
 
DevOps with Elastic Beanstalk - TCCC-2014
DevOps with Elastic Beanstalk - TCCC-2014DevOps with Elastic Beanstalk - TCCC-2014
DevOps with Elastic Beanstalk - TCCC-2014
 
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 

More from Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to GlobusGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform ServicesGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 

More from Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform Services
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 

Recently uploaded

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 

Recently uploaded (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 

Automating Research Data with Globus Flows and Compute

  • 1. Automating Research Data with Globus Flows and Compute Greg Nawrocki greg@globus.org nawrocki@uchicago.edu nawrocki@anl.gov Washington University in St. Louis September 20 & 21, 2022 Case Western Reserve University October 23 – 24, 2023
  • 2. Topics • Globus Flows overview • Automating data management –Run an existing Flow –Build a Flow then run it • Globus Compute overview • Automating end-to-end research flows 4
  • 3. Globus Platform and Automation Capabilities Timer Service The Globus WebApp supports recurring and scheduled transfers. (a.k.a. Globus cron) Command Line Interface The CLI provides an interface to Globus services from the shell and is suited to both interactive and scripting use cases. Globus API / SDK Our open REST APIs and Python SDK empower you to create an integrated ecosystem of research data services and applications. Harness the power of the Globus platform so you can focus on building your application.
  • 4. Automation using Globus Flows Available to all Globus Subscribers • Managed, secure (Globus Auth), reliable task orchestration • Support for heterogenous resources • Extensible and authorable event driven execution model – Flow Definition (JSON) – Input Schema (JSON) – Deployment • Extensible via custom actions 6
  • 5. Managed automation of tasks • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development Transfer Transfer raw files Compute Launch analysis job Carbon! Correct, classify, … Compute Extract metadata Search Ingest to index Transfer Move final files to repo Share Set access controls
  • 6. Globus Flows service implementation • Built on AWS Step Functions – Simple state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth AWS Step Functions Globus Auth +
  • 7. Automation services ecosystem GET /provider_url/ POST /provider_url/run GET /provider_url/action_id/status GET /provider_url/action_id/cancel GET /provider_url/action_id/status Create Action Providers Define and deploy flows { “StartAt”: ”ToProject”, ”States” : { ”ToProject” : { … }, ”SetPermission” : { …}, “ProcessData” : { … } … }} Run flows
  • 9. Flow lifecycle 12 • Define using JSON/YAML • Deploy to Flows service
  • 10. Flow lifecycle 13 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution
  • 11. Flow lifecycle 14 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor
  • 12. Flow lifecycle: Write once, run many 15 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor • …and run again!
  • 13. Let’s take a look… 16
  • 15. A simple, rather contrived, use case Transfer files to intermediate storage Transfer Actions Transfer files to final storage Transfer 1 2
  • 16. Ex. 1: Run an existing flow using the web app • Navigate to app.globus.org/flows • Find the flow named “Two Stage Globus Transfer” and click ”Start” • Consent to allow the flow access to your account • Source – Collection: Globus Tutorial Endpoint 1 – Path: /share/godata/ • Intermediate – Collection and path of your choice – You can even use the collection you created yesterday in the admin tutorial • Destination – Collection: Globus Tutorial Endpoint 2 – Path: /~/ • Add appropriate labels and tags • Start Run! • Click “View Run Details” and “Event Log” to monitor progress 19
  • 18. A simple, and very common, use case Transfer raw instrument images Transfer Set access controls for sharing data Share 1 2 Actions
  • 20. • Uses Globus defined Action Providers • https://globus-automate-client.readthedocs.io/en/latest/globus_action_providers.html • transfer • Uses the Globus Transfer Task API to perform a transfer of data from one Globus Collection to another. • set_permission • Uses the Globus Transfer ACL API to set or manage permissions on a folder or file. Example Flow
  • 21. Initial Housekeeping import sys import os import time import json import uuid import pickle import base64 import globus_sdk from globus_automate_client import FlowsClient # ID of this tutorial notebook as registered with Globus Auth CLIENT_ID = 'f794186b-f330-4595-b6c6-9c9d3e903e47’ • Things we need in place for this Notebook to run and access the Globus SDK and Globus Flows client.
  • 22. Initial Housekeeping # Feel free to replace the collection UUIDs below with those of your own collections # "Globus Tutorial Endpoint 1" source_collection = "ddb59aef-6d04-11e5-ba46-22000b92c6ec” # "Globus Tutorials on ALCF Eagle" destination_collection = "a6f165fa-aee2-4fe5-95f3-97429c28bf82” # "Tutorial Users" group my_collaborators = "50b6a29c-63ac-11e4-8062-22000ab68755”
  • 23. Authentication and Authorization • All interactions between users and services on the Globus automation platform are governed by the Globus Auth service. • Consent must be given by the user for each interaction taking place on their behalf. • When executing a flow. • When deploying a new flow on the Globus Flow service. • This Notebook in our JupyterHub. • Access to the Flow service is already granted to you by virtue of authenticating to the JupyterHub running this notebook – the tokens are already in place. • If you're running this notebook in your own environment you will need to manually log into Globus Auth and get tokens using a native app authorization flow (see the `Platform_Introduction_Native_App_Auth` notebook for an example of how to initiate this flow).
  • 24. The Globus Flows Service in a Jupyter Notebook login REST APIs { “tokens”:… {“tokens”:… REST APIs Flow Service Bearer a45cd… # Get Globus Auth token data from the JupyterHub environment tokens = pickle.loads(base64.b64decode(os.getenv('GLOBUS_DATA')))['tokens'] # Introspect tokens print(json.dumps(tokens, indent=2))
  • 25. Authentication and Authorization # Create a variable for storing flow scope tokens. Each newly deployed flow # scope needs to be authorized separately, # and will have its own set of tokens. Save each of these tokens by scope. saved_flow_scopes = {} # Add a callback to the flows client for fetching scopes. # It will draw scopes from `saved_flow_scopes` def get_flow_authorizer(flow_url, flow_scope, client_id): return globus_sdk.AccessTokenAuthorizer (access_token=saved_flow_scopes[flow_scope]['access_token']) # Setup the Flow client, using tokens from our Jupyterhub login to access the Globus Flows service, and # set the `get_flow_authorizer` callback for any new flows we authorize. flows_authorizer = globus_sdk.AccessTokenAuthorizer (access_token=tokens['flows.globus.org']['access_token']) flows_client = FlowsClient.new_client (CLIENT_ID, get_flow_authorizer, flows_authorizer) • Once you’ve got the tokens the authentication magic happens.
  • 26. Fetch User Identity # Create an Auth client so we can look up identities auth_authorizer = globus_sdk.AccessTokenAuthorizer (access_token=tokens['auth.globus.org']['access_token']) ac = globus_sdk.AuthClient(authorizer=auth_authorizer) # Get the user's primary identity primary_identity = ac.oauth2_userinfo() identity_id = primary_identity['sub'] print(f"Username: {primary_identity['preferred_username']} (ID: {identity_id})") print(f"Notifications will be sent to: {primary_identity['email']}") • When transferring files to the destination collection we will put them in a uniquely named directory: • <identity_id>-shared-files • Fetch our user id for this purpose.
  • 27. • Define a Flow • Flows are composed of State Types • The Action Type is what we will highlight in this example • Define a Schema • The user inputs needed for this Flow • Deploy the Flow – The FlowsClient makes that easy! Authoring a Flow
  • 28. # Define flow flow_definition = { "Comment": "Transfer files to a guest collection and set access permissions", "StartAt": "TransferFiles", "States": { • Top Level Fields • From the Amazon States Language playbook • Can Include • Comment • StartAt • First State in the Machine • States • State definitions Authoring a Flow – Define a Flow
  • 29. • Supported States from the Amazon States Language playbook • Pass • Passes input to output – performs no work • Choice • Adds branching logic to a state machine. • Wait • Delays the machine from continuing for a specified time. • Fail • Terminates the machine as a failed run. • Globus Defined States • Action • References the Action Providers – The heart of our example. • ExpressionEval • Method of evaluating an expression to create parameter values for passing to an Action. • Combines the Action and Pass State Types providing the ability to compute results for Parameters (Action) and the simple storage of the new values (Pass). • Useful for determining a value to be tested in a Choice State or to compute a “final” value seen in the output of the Flow upon completion. State Types
  • 30. The Action State Type – by way of example "TransferFiles": "Comment": "Transfer to a guest collection", "Type": "Action", "ActionUrl": "https://actions.automate.globus.org/transfer/transfer", • Name the State – “TransferFiles” • Comment • Self explanatory • Type : Action (required) • ActionUrl (required) • The base URL of the Action (Service Endpoint). As defined by the Action Interface.
  • 31. The Action State Type – by way of example "Parameters": { "source_endpoint_id.$": "$.input.source.id", "destination_endpoint_id.$": "$.input.destination.id", "transfer_items": [ { "source_path.$": "$.input.source.path", "destination_path.$": "$.input.destination.path", "recursive.$": "$.input.recursive_tx" } ] }, • Each Action Provider (optionally) defines its own set of properties/inputs. • Input to the Action can either be referenced by “InputPath” or “Parameters”. • In this example the parameters are referenced from the input schema (we’ll see that soon).
  • 32. The Action State Type – by way of example "ResultPath": "$.TransferFiles", "WaitTime": 60, "Next": "SetPermission", }, • “ResultPath”: Is a Reference Path indicating where the output of the Action will be placed in the state of the Flow run-time. • “WaitTime” (optional, default value 300 – five minutes): The maximum amount time to wait for the Action to complete (or abort) in seconds. • “Next or End” (mutually exclusive, one required): These indicate how the Flow should proceed after the Action state. – “Next ”indicates the name of the following state of the flow. – “End” with a value ”True” indicates that the Flow is complete after this state completes.
  • 33. The Action State Type – another example "SetPermission": { "Comment": "Grant read permission on the data to a Globus user or group", "Type": "Action", "ActionUrl": "https://actions.automate.globus.org/transfer/set_permission", "Parameters": { "endpoint_id.$": "$.input.destination.id", "path.$": "$.input.destination.path", "operation": "CREATE", "permissions": "r", # read-only access "principal_type.$": "$.input.principal_type", # 'group' or 'identity' "principal.$": "$.input.principal_identifier" }, "ResultPath": "$.SetPermission", "End": True } } }
  • 34. The Action State Type – wrap up • The examples above are not exhaustive – for more information on the Action State Type • https://globus-automate- client.readthedocs.io/en/latest/authoring_flows.html#action-state-type • Globus Action Providers • https://globus-automate- client.readthedocs.io/en/latest/globus_action_providers.html • Roll your own Action Providers • https://action-provider-tools.readthedocs.io/en/latest/
  • 35. • All Flows require schemas to validate user input. • Yea! More JSON! Authoring a Flow – Define a Schema # Define input schema input_schema = { "required": [ "input" ], "properties": { "input": { "type": "object", "required": [ "source", "destination", "recursive_tx", "principal_identifier", "principal_type" ], • User input we need for this Flow – source o Globus Collection containing the source data – destination o Globus Guest collection that will be the destination of the transfer action – recursive_tx o Boolean flag to state whether or not to transfer files recursively – principal_identifier o UUID of the user identity or group to share data with – principal_type o Specifies whether to share with an individual user of group identity
  • 36. Authoring a Flow – Define a Schema "properties": { "source": { "type": "object", "title": "Select source collection and path", "description": "The source collection and path (path MUST end with a slash)", "format": "globus-collection", "required": [ "id", "path" ], "properties": { "id": { "type": "string", "format": "uuid", "default": source_collection }, "path": { "type": "string" } }, "additionalProperties": False }, • Schema for the “source” object – globus-collection is a custom format o https://globus-automate- client.readthedocs.io/en/latest/authori ng_flows.html#globus-web-app- custom-formats – Note the default to source_collection which we defined at the beginning of this notebook.
  • 37. Authoring a Flow – Define a Schema ”destination": { "type": "object", "title": "Select destination collection and path", etc… "recursive_tx": { "type": "boolean", "title": "Recursive transfer", etc… "principal_type": { "type": "string", "title": "Type of principal to share with", etc… "principal_identifier": { "type": "string", "title": "UUID of user identity or group", etc… • Finish Schema for remaining user input parameters
  • 38. Flow Deployment # Deploy the flow flow_title = f"Tutorial-Transfer-Share-{str(uuid.uuid4())}" # generate a unique title flow = flows_client.deploy_flow( flow_definition, title=flow_title, input_schema=input_schema, ) flow_id = flow['id'] flow_scope = flow['globus_auth_scope’] print(f"Successfully deployed flow (ID: {flow_id})") print(f"Flow scope: {flow_scope}nn") print(f"View the flow in the Webapp: https://app.globus.org/flows/{flow_id}") print(f"Note: You can start your flow directly from the Webapp!") • Simple method of the FlowsClient
  • 40. Flow Updating flow = flows_client.update_flow( flow_id, flow_definition, # administered_by=[f"urn:globus:auth:identity:{identity_id}"]) # runnable_by=[f"urn:globus:auth:identity:{identity_id}"]) visible_to=[f"urn:globus:auth:identity:{identity_id}"]) • If you change the Flow you will need to update it. • Very similar to the deploy step. • By default Flows are only visible to their creator, you can modify that here. • https://globus-automate- client.readthedocs.io/en/latest/python_sdk_reference.html
  • 41. Flow Execution – From the API • Flows may be run via the globus-automate API – See section C of the Jupyter Notebook • Authorize the Flow – Native App Grant process • Define the Flow Inputs – Define Flow inputs with a JSON document • Run the Flow – Trivial thanks again to the FlowsClient
  • 42. Flows - Administrivia • Flows can be created / updated / run from the Globus CLI • Flows is a subscription service – Non-subscribers can have a single flow – You should delete the flow we just created if you want to follow along with the next example. • If my institution has a subscription, how do I run more than one flow? – Short answer, contact me (greg@globus.org) or the Globus Support Staff (support@globus.org) – This process will improve
  • 43. Now we’ll add computation to our flow Transfer raw instrument images Run a compute job to process image files Transfer Compute Move processed images to repository Set access controls for sharing data Share Transfer 1 2 3 4
  • 44. Globus Compute – Formerly FuncX 47
  • 45. Globus Compute • Globus Compute: managed, federated FaaS • Compute function: Python code registered with the Globus compute service ! simple image processing • Compute endpoint: any system running the Globus Compute agent ! our EC2 instance • Currently you can only run functions you register 48
  • 46. Globus Compute transforms any computing resource into a function serving endpoint • pip installable endpoint – Globus Auth for registration • Elastic resource provisioning from local, cluster, or cloud system (via Parsl) • Parallel execution using local fork or via common schedulers – Slurm, PBS, LSF, Cobalt, K8s 49 Compute Service
  • 47. Web interface to Compute 50 List of compute endpoints available to user Status and details of compute endpoint
  • 48. Compute service will evolve rapidly • Multi-user compute endpoints • Native integration with transfer for stage in and stage out of data for compute tasks • Expanding compute service interfaces in the webapp for administrators and users
  • 50. Configure our computing resource • Register a compute endpoint with Globus Compute – Activate venv: ~/.compute/bin/activate o Virtual environment – contains necessary packages – Register: globus-compute-endpoint configure EP_NAME – Start: globus-compute-endpoint start EP_NAME – Save the registered endpoint UUID – View endpoint in the web app: app.globus.org/compute 53
  • 51. Register and execute a function • Register a function with Globus Compute – Activate venv: ~/.compute/bin/activate (should already be done) – Register: python ~/globus-flows-trigger- examples/compute_function.py – Save the registered function UUID • Open interactive Python shell and run the function 54
  • 52. Configure our computing resource storage • We need a way to get the data to that computing resource • Setup and run Globus Connect Personal – Setup: globusconnectpersonal – Run: globusconnectpersonal -start & – Save the registered collection UUID – View collection in the web app 55
  • 53. Adding computation to our flow Transfer raw instrument images Run a compute job to process image files Transfer Compute Move processed images to repository Set access controls for sharing data Share Transfer 1 2 3 4
  • 54. EC2 Instance Computing Resource Our environment Compute Endpoint Registered Compute Function transfer control ALCF Sharing Resource transfer raw files 1 invoke image processing function 2 set permissions 4 transfer result files 3 GCP Endpoint GCS Endpoint Compute Service Transfer Service Instrument (same EC2 Instance) GCP Endpoint Monitor script 0 trigger flow run
  • 55. Incorporate compute into a flow (1/3) • Review the flow definition and schema: – transfer_compute_share_definition.json – Actually… we’ll do that after we deploy it • Deploy the enhanced flow – Activate venv: ~/.trigger/bin/activate – Deploy: deploy_flow --flowdef --schema --title 58
  • 56. Incorporate compute into a flow (2/3) • Update the monitoring script – Edit trigger_transfer_compute_share.py • Modify… – Flow ID – Source collection ID and path (the “instrument”) – Destination collection ID and path (the compute endpoint) – Compute endpoint and function IDs – Result share collection ID and path (the sharing resource) 59
  • 57. Incorporate compute into a flow (3/3) • Run the monitoring script ./trigger_transfer_compute_share_flow.py --watchdir /home/devN/images --patterns .done • Activate the trigger – Copy *.png files to directory being monitored – “touch” iam.done file to trigger the flow • Monitor the running flow in the web app 60
  • 58. EC2 Instance Computing Resource Enjoy our success! Compute Endpoint Registered Compute Function transfer control access result files ALCF Sharing Resource transfer raw files 1 invoke image processing function 2 set permissions 4 transfer result files 3 GCP Endpoint GCS Endpoint Compute Service Transfer Service Instrument (same EC2 Instance) GCP Endpoint Monitor script 0 trigger flow run
  • 59. Extending the ecosystem: Action providers 62 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider-tools.readthedocs.io compute ACLs delete identifier transfer notify ingest mkdir search ls Xtract describe web form Custom developed docs.globus.org/api/flows/hosted-action-providers
  • 60. Flows Resources • Globus Documentation: docs.globus.org • Flows Specific Doc: https://docs.globus.org/api/flows/ –Globus Flows Overview o Authoring Flows o Running a Flow automatically o Python SDK Reference –Globus Operated Action Providers –Globus Action Provider API Specification –Globus Flows API Specification