Simplifying Science Gateway
Data Management with Globus
3: Data Distribution
Vas Vasiliadis
vas@uchicago.edu
Gateways 2020, October 15, 2020
Data repositories and distribution
Bulk data
transfer
2
Search, request
data of interest
1
• Science gateway/portal
enables faceted search
• Enforces fine-grained
authorization
• HTTPS download for
“small” data
• Asynchronous transfer
for larger data sets
2
Browser based
download
Globally accessible
multi-tenant service
2
RDA as a Data Provider
• Provide curated datasets
– Search and Discovery
– Robust metadata archives
• Data Manipulation
– Subset capabilities
– Format conversion
• Enable reproducible research
– Structured policies for use of DOIs
– User history of data access maintained to facilitate dynamic citation generation
• Support NCAR PIs with data management needs
– Large, complex datasets
3
Source: “The NCAR RDA – Globus Integration: Experiences developing a modern research data portal”, Riley Conroy, GlobusWorld 2019
Let’s take a look…
4
Here’s what we need…
• Shared collection
• Science Gateway (Auth) client as Access Manager
• Group(s) for more easily managing access (optional)
• Code to transfer data
• Code to grant access
– Approach A: Add user’s identity to a Globus Group
– Approach B: Grant access directly to user’s identity
• Code to clean up after successful transfer (or time)
5
Conceptual architecture: Sharing
Managed
Endpoint
Subscriber
Control
Domain
Globus
Control
Domain Sharing ACL: Globus managed ”overlay” permissions;
no changes to underlying filesystem permissions!
Shared
Endpoint
DATA
Channel
CONTROL
Channel Subscriber managed
filesystem permissions
External User
Control
Domain
Back to our JupyterHub…
7
Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline

Gateways 2020 Tutorial - Instrument Data Distribution with Globus

  • 1.
    Simplifying Science Gateway DataManagement with Globus 3: Data Distribution Vas Vasiliadis vas@uchicago.edu Gateways 2020, October 15, 2020
  • 2.
    Data repositories anddistribution Bulk data transfer 2 Search, request data of interest 1 • Science gateway/portal enables faceted search • Enforces fine-grained authorization • HTTPS download for “small” data • Asynchronous transfer for larger data sets 2 Browser based download Globally accessible multi-tenant service 2
  • 3.
    RDA as aData Provider • Provide curated datasets – Search and Discovery – Robust metadata archives • Data Manipulation – Subset capabilities – Format conversion • Enable reproducible research – Structured policies for use of DOIs – User history of data access maintained to facilitate dynamic citation generation • Support NCAR PIs with data management needs – Large, complex datasets 3 Source: “The NCAR RDA – Globus Integration: Experiences developing a modern research data portal”, Riley Conroy, GlobusWorld 2019
  • 4.
    Let’s take alook… 4
  • 5.
    Here’s what weneed… • Shared collection • Science Gateway (Auth) client as Access Manager • Group(s) for more easily managing access (optional) • Code to transfer data • Code to grant access – Approach A: Add user’s identity to a Globus Group – Approach B: Grant access directly to user’s identity • Code to clean up after successful transfer (or time) 5
  • 6.
    Conceptual architecture: Sharing Managed Endpoint Subscriber Control Domain Globus Control DomainSharing ACL: Globus managed ”overlay” permissions; no changes to underlying filesystem permissions! Shared Endpoint DATA Channel CONTROL Channel Subscriber managed filesystem permissions External User Control Domain
  • 7.
    Back to ourJupyterHub… 7
  • 8.
    Support resources • Globusdocumentation: docs.globus.org • Sample code: github.com/globus • Helpdesk and issue escalation: support@globus.org • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  • 9.
    Join the Globuscommunity • Access the service: globus.org/login • Create a personal endpoint: globus.org/app/endpoints/create-gcp • Documentation: docs.globus.org • Engage: globus.org/mailing-lists • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globusonline

Editor's Notes

  • #3 Describe the general need… Examples: reference data sets, e.g. climate research or telescope observation data instrument data, e.g. sequence data for genomics cores You saw the HTTPS access demonstrated in the previous session… but more often you need to pull large data sets from some facility. Here’s an example…NCAR RDA
  • #4 The RDA distinguishes itself from other entities as a data portal Provenance We have a robust system of handling data After discovery/selection
  • #8 Walk through the Data Distribution notebook Highlight identity ID access Spend lots of time on ACLs and how you can control access via either direct ACL rule or Group membership
  • #9 This is another example of what the end product looks like, in this instance we have multiple projects accessible via this one science gateway …consistent access control …with either uniform or customized data discovery – you, the project owner, decide! DID NOT COVER petreldata.net since Lee was covering it as part of session 4.
  • #10 Demonstrate