3. Distribution Store
Data Portal
Advanced Computing Facility
Instrument Facility
Instrument data orchestration:
A common design pattern
Image Analysis
3
Search/Discovery
5
Science!
6
Imaging
1 Acquisition
2
Description/Identification
4
v
5. Instrument data orchestration:
Relevant Globus platform capabilities
• Authentication and Authorization
• Data transfer and sharing
• Data description and discovery
• Data (and compute) orchestration
5
Auth Search Transfer Groups Flows
6. Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
Auth
6
7. Step 0: Application registration
• Set desired scopes
• Set callback URL
• Get client ID and secret
• Consents implement
least privileges principle
7
Auth
developers.globus.org
8. Authorization Code Grant
8
Client
(Web Portal,
Application)
Globus service
(Resource Server)
Globus Auth
(Authorization Server)
5. Authenticate using client id and
secret, send authorization code
Browser (User)
1. Access
portal
2. Redirect
user
3. User authenticates
and consents
4. Authorization
code
6. Access token(s)
7. Authenticate with access token(s),
giving client authority to invoke the
requested service
Identity
Provider
9. Client credential grant
9
1. Authenticate with app
client id and secret
2. Access Tokens
Application,
Science Gateway,
Data Portal
(Client)
3. Authenticate as app
with access tokens to invoke
service (on behalf of authorized
user, within a given scope)
Globus Transfer
(Resource Server)
Globus Auth
(Authorization Server)
10. Data transfer and sharing
…you already know how to do this ;-)
• Move data to collection à Submit Transfer task
• Make data accessible à Set guest collection access rule
• Grant user(s) access à Add/confirm Group membership
10
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
Groups
Transfer
11. Data description and discovery
• Metadata store with fine-
grained visibility controls
• Schema agnostic
à dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
11
docs.globus.org/api/search
Search
Index
Search
12. Cancer Registry Records for Research (CR3)
• Create network of federated cancer registries
– Deploy similar infrastructure at other cancer registries
– Enable queries across multiple registries
• Federation via Globus: network scale ßà local control
– Data owners input/export data sets, apply QC, set access policies
– Registry data remain at the institution where they were generated
– Identities are provided/authenticated by the institution, not Globus
– System scale depends on data owners providing storage resources
13. CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated to
GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
CR3 Architecture
Globus
Transfer (GT)
Registry Staff
Data transfer from registrar to
researcher mediated by GT
Manage
authorization
Elasticsearch
Request
Service
Cancer Registry De-identified
Data Index (minimal criteria
data: e.g., staging)
14. "laterality": "Left - origin of primary",
"site": "Lung, lower lobe",
"cs_lymph_node": "0",
"recurrence": "Yes",
"cancer_status": "Evidence of this tumor",
"clinical_m": "M0",
"site_code": "C343",
"cs_stage": "1B",
"scope_reg_ln_summ": "4 or more reg LN removed",
"histology_code": "80703",
"mets_at_dx": "No distant metastasis",
"histology": "Squamous cell carcinoma, NOS",
"grade": "Grade II: Mod diff, mod well diff,",
"year_last_contact": "2005",
"spanish_origin": "Non-Spanish; non-Hispanic",
"cause_of_death": "Cancer related",
"dx_year": "2004",
"dx_age_range": "65-74",
"dx_age": "71",
"race": "White",
"disease_category": "Lung",
"clinical_stage": "1B",
"clinical_t": "T2",
"gender": "Female",
"path_n": "N0",
"facility": "Shadyside",
"path_t": "T2",
"clinical_n": "N0",
"path_stage": "1B",
"path_m": "MX"
Safe Harbor NAACCR data in Globus Search
Diseases
• Breast
• Colon
• Lung
• Melanoma
• Head & Neck
• Ovary
• Prostate
Years: 2004 – 2017
Patients: 65,000
21. Data (and compute) automation
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development