Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

Data Orchestration at Scale (GlobusWorld Tour West) Slide 1 Data Orchestration at Scale (GlobusWorld Tour West) Slide 2 Data Orchestration at Scale (GlobusWorld Tour West) Slide 3 Data Orchestration at Scale (GlobusWorld Tour West) Slide 4 Data Orchestration at Scale (GlobusWorld Tour West) Slide 5 Data Orchestration at Scale (GlobusWorld Tour West) Slide 6 Data Orchestration at Scale (GlobusWorld Tour West) Slide 7 Data Orchestration at Scale (GlobusWorld Tour West) Slide 8 Data Orchestration at Scale (GlobusWorld Tour West) Slide 9 Data Orchestration at Scale (GlobusWorld Tour West) Slide 10 Data Orchestration at Scale (GlobusWorld Tour West) Slide 11 Data Orchestration at Scale (GlobusWorld Tour West) Slide 12 Data Orchestration at Scale (GlobusWorld Tour West) Slide 13 Data Orchestration at Scale (GlobusWorld Tour West) Slide 14 Data Orchestration at Scale (GlobusWorld Tour West) Slide 15 Data Orchestration at Scale (GlobusWorld Tour West) Slide 16 Data Orchestration at Scale (GlobusWorld Tour West) Slide 17 Data Orchestration at Scale (GlobusWorld Tour West) Slide 18 Data Orchestration at Scale (GlobusWorld Tour West) Slide 19 Data Orchestration at Scale (GlobusWorld Tour West) Slide 20 Data Orchestration at Scale (GlobusWorld Tour West) Slide 21 Data Orchestration at Scale (GlobusWorld Tour West) Slide 22
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Data Orchestration at Scale (GlobusWorld Tour West)

Download to read offline

Presented at the GlobusWorld Tour West, a virtual workshop on September 15, 2021.

  • Be the first to like this

Data Orchestration at Scale (GlobusWorld Tour West)

  1. 1. Data Orchestration at Scale Using the Globus Platform Vas Vasiliadis vas@uchicago.edu September 15, 2021
  2. 2. The instruments are coming! 2
  3. 3. Distribution Store Data Portal Advanced Computing Facility Instrument Facility Instrument data orchestration: A common design pattern Image Analysis 3 Search/Discovery 5 Science! 6 Imaging 1 Acquisition 2 Description/Identification 4 v
  4. 4. A data portal example: acdc.alcf.anl.gov
  5. 5. Instrument data orchestration: Relevant Globus platform capabilities • Authentication and Authorization • Data transfer and sharing • Data description and discovery • Data (and compute) orchestration 5 Auth Search Transfer Groups Flows
  6. 6. Globus Auth: Foundational IAM service Brokers authentication and authorization among… – End-users – Identity providers: enterprise, external (federated identities) – Services: resource servers with REST APIs – Apps: web, mobile, desktop, command line clients – Services acting as clients to other services • OAuth 2.0 Authorization Framework (a.k.a. OAuth2) • OpenID Connect Core 1.0 (a.k.a. OIDC) Auth 6
  7. 7. Step 0: Application registration • Set desired scopes • Set callback URL • Get client ID and secret • Consents implement least privileges principle 7 Auth developers.globus.org
  8. 8. Authorization Code Grant 8 Client (Web Portal, Application) Globus service (Resource Server) Globus Auth (Authorization Server) 5. Authenticate using client id and secret, send authorization code Browser (User) 1. Access portal 2. Redirect user 3. User authenticates and consents 4. Authorization code 6. Access token(s) 7. Authenticate with access token(s), giving client authority to invoke the requested service Identity Provider
  9. 9. Client credential grant 9 1. Authenticate with app client id and secret 2. Access Tokens Application, Science Gateway, Data Portal (Client) 3. Authenticate as app with access tokens to invoke service (on behalf of authorized user, within a given scope) Globus Transfer (Resource Server) Globus Auth (Authorization Server)
  10. 10. Data transfer and sharing …you already know how to do this ;-) • Move data to collection à Submit Transfer task • Make data accessible à Set guest collection access rule • Grant user(s) access à Add/confirm Group membership 10 Groups service Transfer service GET /groups/my_groups POST /endpoint/{endpoint_id}/access POST /transfer Groups Transfer
  11. 11. Data description and discovery • Metadata store with fine- grained visibility controls • Schema agnostic à dynamic schemas • Simple search using URL query parameters • Complex search using search request document 11 docs.globus.org/api/search Search Index Search
  12. 12. Cancer Registry Records for Research (CR3) • Create network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries • Federation via Globus: network scale ßà local control – Data owners input/export data sets, apply QC, set access policies – Registry data remain at the institution where they were generated – Identities are provided/authenticated by the institution, not Globus – System scale depends on data owners providing storage resources
  13. 13. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Request Service Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging)
  14. 14. "laterality": "Left - origin of primary", "site": "Lung, lower lobe", "cs_lymph_node": "0", "recurrence": "Yes", "cancer_status": "Evidence of this tumor", "clinical_m": "M0", "site_code": "C343", "cs_stage": "1B", "scope_reg_ln_summ": "4 or more reg LN removed", "histology_code": "80703", "mets_at_dx": "No distant metastasis", "histology": "Squamous cell carcinoma, NOS", "grade": "Grade II: Mod diff, mod well diff,", "year_last_contact": "2005", "spanish_origin": "Non-Spanish; non-Hispanic", "cause_of_death": "Cancer related", "dx_year": "2004", "dx_age_range": "65-74", "dx_age": "71", "race": "White", "disease_category": "Lung", "clinical_stage": "1B", "clinical_t": "T2", "gender": "Female", "path_n": "N0", "facility": "Shadyside", "path_t": "T2", "clinical_n": "N0", "path_stage": "1B", "path_m": "MX" Safe Harbor NAACCR data in Globus Search Diseases • Breast • Colon • Lung • Melanoma • Head & Neck • Ovary • Prostate Years: 2004 – 2017 Patients: 65,000
  15. 15. Distinct access policies may be applied to Data and Metadata
  16. 16. Data ingest with Globus Search 16 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "filetype", "subject”: "https://search.api.globus.org/abc.txt", "visible_to": ["public"], "content": { "metadata-schema/file#type": "file” } }, ... ] }
  17. 17. Data ingest with Globus Search 17 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": ”weight", "subject": "https://search.api.globus.org/abc.txt", "visible_to": ["urn:globus:auth:identity:46bd0f56- e24f-11e5-a510-131bef46955c"], "content": { "metadata-schema/file#size": ”37.6", "metadata-schema/file#size_human": ”<50lb” } }, ... ] } Visibility limited to Globus Auth identity - Single user - Globus Group - Registered client application
  18. 18. Data discovery with Globus Search 18 { "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { ... } ], "subject": "https://..." } ], "offset": 0, "total": 1 } GET /index/{index_id}/search?q=type%3Ahdf5 Search Index Simple query Search
  19. 19. Data discovery with Globus Search 19 POST /index/{index_id}/search Search Index Complex query { "filters": [ { "type": "range", "field_name": ”pubdate", "values": [ { "from": "*", "to": "2020-12-31" } ] } ], "facets": [ { "name": "Publication Date", "field_name": "pubdate", ... } ] } Search
  20. 20. Let’s take a look…
  21. 21. Data (and compute) automation • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  22. 22. globus.org docs.globus.org outreach@globus.org support@globus.org

Presented at the GlobusWorld Tour West, a virtual workshop on September 15, 2021.

Views

Total views

74

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×