Successfully reported this slideshow.
Your SlideShare is downloading. ×

Instrument Data Orchestration with Globus Search and Flows

Loading in …3

Check these out next

1 of 39 Ad

More Related Content

Slideshows for you (20)

Similar to Instrument Data Orchestration with Globus Search and Flows (20)


More from Globus (20)

Recently uploaded (20)


Instrument Data Orchestration with Globus Search and Flows

  1. 1. Instrument data orchestration with Globus Search and Flows Vas Vasiliadis October 13, 2021
  2. 2. Why we’re all here this week… 2
  3. 3. Distribution Store Data Portal Advanced Computing Facility Instrument Facility Instrument data orchestration: A common design pattern Image Analysis 3 Search/Discovery 5 Science! 6 Imaging 1 Acquisition 2 Description/Identification 4 v
  4. 4. Three Degrees of Automation Timer Service Scheduled and recurring transfers (a.k.a. Globus cron) Command Line Interface Ad hoc scripting and integration Platform Services Comprehensive data—and compute—orchestration (with human in the loop) Search Flows Transfer & Sharing
  5. 5. Globus Command Line Interface (CLI) …you’re all experts on this already!
  6. 6. Globus Timer Service
  7. 7. The Globus Timer service • Scheduled/recurring file transfers • Well suited to backup/sync tasks • Service with a command line interface – Simple installation: – One-time authentication with a user identity • Example: NIH – 7
  8. 8. Use case: Data replication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 8 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  9. 9. Using the Globus Timer service 9 $ globus–timer session {login, logout, whoami} $ globus–timer job transfer --name example–job --label "Timer Transfer Job" --interval 28800 --start '2020–01–01T12:34:56' --source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec --dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec --item ~/file1.txt ~/new_file1.txt false --item ~/file2.txt ~/new_file2.txt false
  10. 10. Globus Timer service options • ––items–file {file_name} • ––stop–after–runs • ––stop–after–date • Transfer behavior (equivalent to options in web app) ––sync–level (how timer behaves if files exist) ––verify–checksum ––encrypt–data ––preserve–timestamp 10
  11. 11. Timer options in the webapp Coming soon….
  12. 12. Platform Services
  13. 13. Relevant Globus platform capabilities • Data transfer and sharing • Data description and discovery • Data (and compute) orchestration • Authentication and Authorization 13 Auth Search Transfer Groups Flows
  14. 14. Globus Auth: Foundational IAM service Brokers authentication and authorization among… – End-users – Identity providers: enterprise, external (federated identities) – Services: resource servers with REST APIs – Apps: web, mobile, desktop, command line clients – Services acting as clients to other services • OAuth 2.0 Authorization Framework (a.k.a. OAuth2) • OpenID Connect Core 1.0 (a.k.a. OIDC) Auth 14
  15. 15. Several authentication models supported • Application acting as user with consent – Authorization code grant • Application authenticating as itself – Client credentials grant • Application able to manage tokens for offline or long running tasks – Refresh tokens
  16. 16. Authorization Code Grant 16 Client (Web Portal, Application) Globus service (Resource Server) Globus Auth (Authorization Server) 5. Authenticate using client id and secret, send authorization code Browser (User) 1. Access portal 2. Redirect user 3. User authenticates and consents 4. Authorization code 6. Access token(s) 7. Authenticate with access token(s), giving client authority to invoke the requested service Identity Provider
  17. 17. Client credential grant 17 1. Authenticate with app client id and secret 2. Access Tokens Application, Science Gateway, Data Portal (Client) 3. Authenticate as app with access tokens to invoke service (on behalf of authorized user, within a given scope) Globus Transfer (Resource Server) Globus Auth (Authorization Server)
  18. 18. Step 0: Application registration • Set desired scopes • Set callback URL • Get client ID and secret • Consents implement least privileges principle 18 Auth
  19. 19. Data transfer and sharing …you already know how to do this ;-) • Move data to collection à Submit Transfer task • Make data accessible à Set guest collection access rule • Grant user/app access à Add/confirm Group membership 19 Groups service Transfer service GET /groups/my_groups POST /endpoint/{endpoint_id}/access POST /transfer Groups Transfer
  20. 20. Using guest collections in your apps • Create a guest collection; requires authentication – Cannot be completely automated – must ”log in” – Create once and automate rest of the steps • Grant the application Access Manager role – Allows the application to manage permissions on the collection – Set for application identity: • Grant roles for management of endpoint and tasks Transfer
  21. 21. Globus Search Service
  22. 22. Data description and discovery • Metadata store with fine- grained visibility controls • Schema agnostic à dynamic schemas • Simple search using URL query parameters • Complex search using search request document 22 Search Index Search
  23. 23. Cancer Registry Records for Research (CR3) • Create network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries • Federation via Globus: network scale ßà local control – Data owners input/export data sets, apply QC, set access policies – Registry data remain at the institution where they were generated – Identities are provided/authenticated by the institution, not Globus – System scale depends on data owners providing storage resources
  24. 24. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Request Service Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging)
  25. 25. CR3 requirements • Search Index – Only de-identified data in search index – No record-level for researchers • Portal – Fine-grained access control – Researchers must use a specific identity – Access must be logged – Render graphs based on search results – Faceted search in real time
  26. 26. CR3 Portal (simulated data) Federated logon using Globus Auth with Pitt/UPMC as identity providers Dynamically updating charts as facets change Variable facets based on source registry index Google-like text search with facets for filtering Developed using a framework based on the Globus Modern Research Data Portal* design pattern ( * PeerJ Articles:cs-144
  27. 27. Distinct access policies may be applied to Data and Metadata
  28. 28. Data ingest with Globus Search 28 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "filetype", "subject”: "", "visible_to": ["public"], "content": { "metadata-schema/file#type": "file” } }, ... ] }
  29. 29. Data ingest with Globus Search 29 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": ”weight", "subject": "", "visible_to": ["urn:globus:auth:identity:46bd0f56- e24f-11e5-a510-131bef46955c"], "content": { "metadata-schema/file#size": ”37.6", "metadata-schema/file#size_human": ”<50lb” } }, ... ] } Visibility limited to Globus Auth identity - Single user - Globus Group - Registered client application
  30. 30. Data discovery with Globus Search 30 { "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { ... } ], "subject": "https://..." } ], "offset": 0, "total": 1 } GET /index/{index_id}/search?q=type%3Ahdf5 Search Index Simple query Search
  31. 31. Data discovery with Globus Search 31 POST /index/{index_id}/search Search Index Complex query { "filters": [ { "type": "range", "field_name": ”pubdate", "values": [ { "from": "*", "to": "2020-12-31" } ] } ], "facets": [ { "name": "Publication Date", "field_name": "pubdate", ... } ] } Search
  32. 32. Working with Globus Search 32 Metadata, Search and Discovery
  33. 33. Globus Flows Service
  34. 34. Data (and compute) automation • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  35. 35. Automation with Globus Flows • Built on AWS Step Functions – Simple JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  36. 36. Extending the ecosystem: Action providers 36 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided
  37. 37. Working with Globus Flows 37 Automation Using Globus Flows
  38. 38. Coming soon: Globus Trigger service • Trigger–Action platform • Predefined triggers and actions to create rules • Globus processes triggers and reliably executes actions
  39. 39.