Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Instrument data orchestration with
Globus Search and Flows
Vas Vasiliadis
vas@uchicago.edu
October 13, 2021
Why we’re all here this week…
2
Distribution Store
Data Portal
Advanced Computing Facility
Instrument Facility
Instrument data orchestration:
A common des...
Three Degrees of Automation
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad...
Globus Command Line
Interface (CLI)
…you’re all experts on this already!
Globus Timer Service
The Globus Timer service
• Scheduled/recurring file transfers
• Well suited to backup/sync tasks
• Service with a command ...
Use case: Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science ins...
Using the Globus Timer service
9
$ globus–timer session {login, logout, whoami}
$ globus–timer job transfer 
--name exampl...
Globus Timer service options
• ––items–file {file_name}
• ––stop–after–runs
• ––stop–after–date
• Transfer behavior (equiv...
Timer options in the webapp
Coming soon….
Platform Services
Relevant Globus platform capabilities
• Data transfer and sharing
• Data description and discovery
• Data (and compute) or...
Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: en...
Several authentication models supported
• Application acting as user with consent
– Authorization code grant
• Application...
Authorization Code Grant
16
Client
(Web Portal,
Application)
Globus service
(Resource Server)
Globus Auth
(Authorization S...
Client credential grant
17
1. Authenticate with app
client id and secret
2. Access Tokens
Application,
Science Gateway,
Da...
Step 0: Application registration
• Set desired scopes
• Set callback URL
• Get client ID and secret
• Consents implement
l...
Data transfer and sharing
…you already know how to do this ;-)
• Move data to collection à Submit Transfer task
• Make dat...
Using guest collections in your apps
• Create a guest collection; requires authentication
– Cannot be completely automated...
Globus Search Service
Data description and discovery
• Metadata store with fine-
grained visibility controls
• Schema agnostic
à dynamic schemas...
Cancer Registry Records for Research (CR3)
• Create network of federated cancer registries
– Deploy similar infrastructure...
CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pit...
CR3 requirements
• Search Index
– Only de-identified data in search index
– No record-level for researchers
• Portal
– Fin...
CR3 Portal (simulated data)
Federated logon using Globus Auth
with Pitt/UPMC as identity providers
Dynamically updating
ch...
Distinct access policies
may be applied to
Data and Metadata
Data ingest with Globus Search
28
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest...
Data ingest with Globus Search
29
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest...
Data discovery with Globus Search
30
{
"@datatype": "GSearchResult",
"@version": "2017-09-01",
"count": 1,
"gmeta": [
{
"@...
Data discovery with Globus Search
31
POST /index/{index_id}/search
Search
Index
Complex query
{
"filters": [
{
"type": "ra...
Working with Globus
Search
32
jupyter.demo.globus.org
Metadata, Search and Discovery
Globus Flows Service
Data (and compute) automation
• Flows: A platform service for defining, applying, and
sharing distributed research automat...
Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops,...
Extending the ecosystem: Action providers
36
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– R...
Working with Globus
Flows
37
jupyter.demo.globus.org
Automation Using Globus Flows
Coming soon: Globus Trigger service
• Trigger–Action platform
• Predefined triggers and
actions to create rules
• Globus p...
globus.org
docs.globus.org
outreach@globus.org
support@globus.org
Upcoming SlideShare
Loading in …5
×

of

Instrument Data Orchestration with Globus Search and Flows Slide 1 Instrument Data Orchestration with Globus Search and Flows Slide 2 Instrument Data Orchestration with Globus Search and Flows Slide 3 Instrument Data Orchestration with Globus Search and Flows Slide 4 Instrument Data Orchestration with Globus Search and Flows Slide 5 Instrument Data Orchestration with Globus Search and Flows Slide 6 Instrument Data Orchestration with Globus Search and Flows Slide 7 Instrument Data Orchestration with Globus Search and Flows Slide 8 Instrument Data Orchestration with Globus Search and Flows Slide 9 Instrument Data Orchestration with Globus Search and Flows Slide 10 Instrument Data Orchestration with Globus Search and Flows Slide 11 Instrument Data Orchestration with Globus Search and Flows Slide 12 Instrument Data Orchestration with Globus Search and Flows Slide 13 Instrument Data Orchestration with Globus Search and Flows Slide 14 Instrument Data Orchestration with Globus Search and Flows Slide 15 Instrument Data Orchestration with Globus Search and Flows Slide 16 Instrument Data Orchestration with Globus Search and Flows Slide 17 Instrument Data Orchestration with Globus Search and Flows Slide 18 Instrument Data Orchestration with Globus Search and Flows Slide 19 Instrument Data Orchestration with Globus Search and Flows Slide 20 Instrument Data Orchestration with Globus Search and Flows Slide 21 Instrument Data Orchestration with Globus Search and Flows Slide 22 Instrument Data Orchestration with Globus Search and Flows Slide 23 Instrument Data Orchestration with Globus Search and Flows Slide 24 Instrument Data Orchestration with Globus Search and Flows Slide 25 Instrument Data Orchestration with Globus Search and Flows Slide 26 Instrument Data Orchestration with Globus Search and Flows Slide 27 Instrument Data Orchestration with Globus Search and Flows Slide 28 Instrument Data Orchestration with Globus Search and Flows Slide 29 Instrument Data Orchestration with Globus Search and Flows Slide 30 Instrument Data Orchestration with Globus Search and Flows Slide 31 Instrument Data Orchestration with Globus Search and Flows Slide 32 Instrument Data Orchestration with Globus Search and Flows Slide 33 Instrument Data Orchestration with Globus Search and Flows Slide 34 Instrument Data Orchestration with Globus Search and Flows Slide 35 Instrument Data Orchestration with Globus Search and Flows Slide 36 Instrument Data Orchestration with Globus Search and Flows Slide 37 Instrument Data Orchestration with Globus Search and Flows Slide 38 Instrument Data Orchestration with Globus Search and Flows Slide 39
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Instrument Data Orchestration with Globus Search and Flows

Download to read offline

Presented at the APS Workshop, a virtual event on October 13, 2021

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Instrument Data Orchestration with Globus Search and Flows

  1. 1. Instrument data orchestration with Globus Search and Flows Vas Vasiliadis vas@uchicago.edu October 13, 2021
  2. 2. Why we’re all here this week… 2
  3. 3. Distribution Store Data Portal Advanced Computing Facility Instrument Facility Instrument data orchestration: A common design pattern Image Analysis 3 Search/Discovery 5 Science! 6 Imaging 1 Acquisition 2 Description/Identification 4 v
  4. 4. Three Degrees of Automation Timer Service Scheduled and recurring transfers (a.k.a. Globus cron) Command Line Interface Ad hoc scripting and integration Platform Services Comprehensive data—and compute—orchestration (with human in the loop) Search Flows Transfer & Sharing
  5. 5. Globus Command Line Interface (CLI) …you’re all experts on this already!
  6. 6. Globus Timer Service
  7. 7. The Globus Timer service • Scheduled/recurring file transfers • Well suited to backup/sync tasks • Service with a command line interface – Simple installation: pypi.org/project/globus-timer-cli – One-time authentication with a user identity • Example: NIH – hpc.nih.gov/storage/globus_cron.html 7
  8. 8. Use case: Data replication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 8 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  9. 9. Using the Globus Timer service 9 $ globus–timer session {login, logout, whoami} $ globus–timer job transfer --name example–job --label "Timer Transfer Job" --interval 28800 --start '2020–01–01T12:34:56' --source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec --dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec --item ~/file1.txt ~/new_file1.txt false --item ~/file2.txt ~/new_file2.txt false
  10. 10. Globus Timer service options • ––items–file {file_name} • ––stop–after–runs • ––stop–after–date • Transfer behavior (equivalent to options in web app) ––sync–level (how timer behaves if files exist) ––verify–checksum ––encrypt–data ––preserve–timestamp 10
  11. 11. Timer options in the webapp Coming soon….
  12. 12. Platform Services
  13. 13. Relevant Globus platform capabilities • Data transfer and sharing • Data description and discovery • Data (and compute) orchestration • Authentication and Authorization 13 Auth Search Transfer Groups Flows
  14. 14. Globus Auth: Foundational IAM service Brokers authentication and authorization among… – End-users – Identity providers: enterprise, external (federated identities) – Services: resource servers with REST APIs – Apps: web, mobile, desktop, command line clients – Services acting as clients to other services • OAuth 2.0 Authorization Framework (a.k.a. OAuth2) • OpenID Connect Core 1.0 (a.k.a. OIDC) Auth 14
  15. 15. Several authentication models supported • Application acting as user with consent – Authorization code grant • Application authenticating as itself – Client credentials grant • Application able to manage tokens for offline or long running tasks – Refresh tokens
  16. 16. Authorization Code Grant 16 Client (Web Portal, Application) Globus service (Resource Server) Globus Auth (Authorization Server) 5. Authenticate using client id and secret, send authorization code Browser (User) 1. Access portal 2. Redirect user 3. User authenticates and consents 4. Authorization code 6. Access token(s) 7. Authenticate with access token(s), giving client authority to invoke the requested service Identity Provider
  17. 17. Client credential grant 17 1. Authenticate with app client id and secret 2. Access Tokens Application, Science Gateway, Data Portal (Client) 3. Authenticate as app with access tokens to invoke service (on behalf of authorized user, within a given scope) Globus Transfer (Resource Server) Globus Auth (Authorization Server)
  18. 18. Step 0: Application registration • Set desired scopes • Set callback URL • Get client ID and secret • Consents implement least privileges principle 18 Auth developers.globus.org
  19. 19. Data transfer and sharing …you already know how to do this ;-) • Move data to collection à Submit Transfer task • Make data accessible à Set guest collection access rule • Grant user/app access à Add/confirm Group membership 19 Groups service Transfer service GET /groups/my_groups POST /endpoint/{endpoint_id}/access POST /transfer Groups Transfer
  20. 20. Using guest collections in your apps • Create a guest collection; requires authentication – Cannot be completely automated – must ”log in” – Create once and automate rest of the steps • Grant the application Access Manager role – Allows the application to manage permissions on the collection – Set for application identity: appclientid@clients.auth.globus.org • Grant roles for management of endpoint and tasks Transfer
  21. 21. Globus Search Service
  22. 22. Data description and discovery • Metadata store with fine- grained visibility controls • Schema agnostic à dynamic schemas • Simple search using URL query parameters • Complex search using search request document 22 docs.globus.org/api/search Search Index Search
  23. 23. Cancer Registry Records for Research (CR3) • Create network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries • Federation via Globus: network scale ßà local control – Data owners input/export data sets, apply QC, set access policies – Registry data remain at the institution where they were generated – Identities are provided/authenticated by the institution, not Globus – System scale depends on data owners providing storage resources
  24. 24. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Request Service Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging)
  25. 25. CR3 requirements • Search Index – Only de-identified data in search index – No record-level for researchers • Portal – Fine-grained access control – Researchers must use a specific identity – Access must be logged – Render graphs based on search results – Faceted search in real time
  26. 26. CR3 Portal (simulated data) Federated logon using Globus Auth with Pitt/UPMC as identity providers Dynamically updating charts as facets change Variable facets based on source registry index Google-like text search with facets for filtering Developed using a framework based on the Globus Modern Research Data Portal* design pattern (docs.globus.org/mrdp) * PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/
  27. 27. Distinct access policies may be applied to Data and Metadata
  28. 28. Data ingest with Globus Search 28 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "filetype", "subject”: "https://search.api.globus.org/abc.txt", "visible_to": ["public"], "content": { "metadata-schema/file#type": "file” } }, ... ] }
  29. 29. Data ingest with Globus Search 29 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": ”weight", "subject": "https://search.api.globus.org/abc.txt", "visible_to": ["urn:globus:auth:identity:46bd0f56- e24f-11e5-a510-131bef46955c"], "content": { "metadata-schema/file#size": ”37.6", "metadata-schema/file#size_human": ”<50lb” } }, ... ] } Visibility limited to Globus Auth identity - Single user - Globus Group - Registered client application
  30. 30. Data discovery with Globus Search 30 { "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { ... } ], "subject": "https://..." } ], "offset": 0, "total": 1 } GET /index/{index_id}/search?q=type%3Ahdf5 Search Index Simple query Search
  31. 31. Data discovery with Globus Search 31 POST /index/{index_id}/search Search Index Complex query { "filters": [ { "type": "range", "field_name": ”pubdate", "values": [ { "from": "*", "to": "2020-12-31" } ] } ], "facets": [ { "name": "Publication Date", "field_name": "pubdate", ... } ] } Search
  32. 32. Working with Globus Search 32 jupyter.demo.globus.org Metadata, Search and Discovery
  33. 33. Globus Flows Service
  34. 34. Data (and compute) automation • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  35. 35. Automation with Globus Flows • Built on AWS Step Functions – Simple JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  36. 36. Extending the ecosystem: Action providers 36 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided
  37. 37. Working with Globus Flows 37 jupyter.demo.globus.org Automation Using Globus Flows
  38. 38. Coming soon: Globus Trigger service • Trigger–Action platform • Predefined triggers and actions to create rules • Globus processes triggers and reliably executes actions
  39. 39. globus.org docs.globus.org outreach@globus.org support@globus.org

Presented at the APS Workshop, a virtual event on October 13, 2021

Views

Total views

45

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×