We present various use cases for applying the Globus data sharing capability, and discuss use of service accounts for automating data access and transfer tasks.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
2. Secure data sharing …from any storage
Collaborator logs into Globus
and accesses shared files;
no local account required;
download via Globus
2
On-prem or
public cloud
storage
Select files to share,
select user or group,
and set access
permissions
1
Globally accessible
multi-tenant service
Globus controls
access to shared files
on existing storage
Laptop, server,
compute facility
• Fine-grained access
control “overlay” on
storage system
• Share with any identity,
email, group
• No need to stage data just
for sharing
v
3. Guest collections
• Directly addressable entities
• Bulk data access (via Globus transfer service)
• HTTP/S access directly from collection
• Created by authorized users to share data they
have access to
• Permissions at folder level, for user/group/service
credential to access data
• Roles for granting management rights
4. Let’s try it…
• Create guest collection
• Set permissions
• Set Roles
Tutorial cheatsheet: bit.ly/gw-tut-rpi
5. Considerations for using data sharing
• Guest collection creation cannot be automated
• Permission management can be fully automated
• Typical pattern:
– One guest collection
– Many permissions per folder
• Clean up permissions and guest collections when not
in use
6. Administrator controls
• Enable use of sharing
• What parts of the file system
• Which users
• What level of sharing (read-only)
• Share with users in specific domains
• Monitor and manage permissions on guest
collection
8. Data from instruments
• Provide near-real time
access to data
• Automated permissions
based on site policy
• Self managed by the PI
• Federated login to
access data
Raw data store
Personal Computer
Remote
visualization/analysis
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
9. Distribution from data archive/repository
• Portal/science gateway
to distribute data
• Interface to search and
gather data of interest
• Asynchronous transfer
to user’s system or via
HTTPS to “staged” data
• Fine-grained
authorization enforced
Search and request
data of interest
Transfer
data to
destination
10. Example: Instrument data delivery at scale
Use Globus to deliver
100s of TB of genomic
data to researchers
Credits: Joe George, University of Michigan
11. Core center data processing
• Allow user to securely
upload data for analysis
• Make analysis results
available to user
• Automate setup and
tear down of folders
and permissions
--/123/input rw
Analysis System
--/123/output r
13. What do you need to automate?
• Service accounts or application credentials
– Client id and Secret
– Identity of the application: client_id@clients.auth.globus.org
• Guest collection
• Permission for service account to manage the guest
collection
15. Registering a service account
• Webapp - Settings
– app.globus.org/settings/developers
16. Accessing data using service accounts
• Service accounts have a client id and secret
• There is no user involved, so there is no consent
• Transfer service sees the request identity as
client_id@clients.auth.globus.org
• The identity must have permissions for operations
– E.g. For transfer: read at source, and write at destination
– E.g. For permission management: must have access manager
role
17. Let’s try allowing an app manage permission
• Create guest collection or use one you
already created
• Set permission for service account via
webapp
• Try accessing data as guest collection
Tutorial cheatsheet: bit.ly/gw-tut-rpi
18. Grant a role for the service account
• Set access
manager for
service account
to manage
permissions
19. Let’s try allowing an app manage permission
• Set Access Manager role for the service
account
• Try listing/setting permissions on the
guest collection
Tutorial cheatsheet: bit.ly/gw-tut-rpi