Ad hoc data sharing
• Individual users share
data with collaborators
• Using a known email or
identity for user/group
• Make data “publicly”
available
1
Compute Facility
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
Researcher selects
files to share, selects
user or group, and
sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
Personal Computer
Share
2
Data from instrument facility
• Provide near-real time
access to data
• Automated permissions
based on site policy
• Self managed by the PI
• Federated login to
access data
Raw data store
Personal Computer
Remote
visualization/analysis
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
Data from provider/archive
• Portal/science gateway
to distribute data
• Interface to search and
gather data of interest
• Asynchronous transfer
to user’s system or via
HTTPS to staged data
• Fine-grained
authorization enforced
Search and request
data of interest
Transfer
data to
destination
Core center data processing
• Allow user to securely
upload data for analysis
• Make analysis results
available to user
• Automate setup and
tear down of folders
and permissions
--/123/input rw
Analysis System
--/123/output r
Common solution components
• Shared endpoint for “staging” data
• Application that manages permissions
• Data transfer, to/from shared endpoint
Data sharing features
• Shared endpoint creation requires authentication
– Cannot be completely automated
– Must be a managed endpoint
• Roles for management of endpoint and tasks
– Assign role to other users, groups and applications
• Access manager role grants others the rights to
manage permissions
– Others = users, groups, and applications
Data sharing permissions management
• Permissions are set per folder, on a shared endpoint
• Permissions management can be automated
• For a user
– Identity: user must log in with this
– Email: user gets a code via email; link to their Globus account
• For a group
– Group UUID: search for group to get UUID
– Access governed by membership in the group
• For an application
– Application identity: appclientid@clients.auth.globus.org (apps are people too!)
Application concepts
• Custom app (automatically) manages permissions
– Can use Globus CLI (as demonstrated)
• Confidential app: use client id and secret
– Ensure application is on a secure device
– Set up policy for rotation of secret
– Identity: appclientid@clients.auth.globus.org
Client credential grant
12
1. Authenticate with app
client id and secret
2. Access Tokens
Application,
Science Gateway,
Data Portal
(Client)
3. Authenticate as app
with access tokens to invoke
service (on behalf of authorized
user, within a given scope)
Globus Transfer
(Resource Server)
Globus Auth
(Authorization Server)
Data transfer scenarios
• Application moving data of its own accord
– App has access to source data and can write to destination
– Requires shared endpoints on both sides (why?)
– Client credential grant
• Application moving data as user
– Only user has access to data on source/destination
– Authorization code grant
– Similar to the data portal example presented earlier
Walkthrough: Data Distribution Application
What: Make select data available to authorized user(s)
How:
See example code at:
github.com/globus/automation-examples/blob/master/share_data.py
1. Creates folder on shared endpoint
2. Moves data to folder
3. Sets permissions on folder for user/group
On your EC2 instance in ~/automation-examples
Application registration
• Register the application at developers.globus.org
– Redirects: https://auth.globus.org/v2/web/auth-code
– Scopes: globus:auth:scope:transfer.api.globus.org:all
• Get client id and secret
• Add client id to the app
Shared endpoint configuration
• Create at top level folder
• Set access manager role for app to manage access
permissions
• Optionally…
– Set endpoint administrator role (can change endpoint definition)
– Set endpoint manager role (can monitor and manage tasks)
– Set endpoint monitor role (can monitor tasks)
Application access
• Use client credential grant to authenticate as app
– Client id and secret used for obtaining tokens
– Identity username is appclientid@clients.auth.globus.org
• Create a folder for user (or project)
• Set permissions on folder (user/group)
• Create transfer task to move data to folder
• (Optionally) notify user(s) that data is available
But the data are big, distributed…
…and the science is collaborative
petrel.alcf.anl.gov
materialsdatafacility.org
2PB, 80Gbps store
3.2M materials data
Cooley: 290 TFLOPS
Query1 Share4
Transfer2
Learn3
Need multi-credential, multi-service authentication and data management
Hub
Configurable HTTP proxy
Authenticator
User DB
Spawner
Notebook
/api/auth
Browser
/hub/
/user/[name]/
• Multi-user hub
• Manages multiple instances
of Jupyter notebook server
• Configurable HTTP proxy
JupyterHub
Goal: Liberate the notebook!
• Tokens for remote services
• APIs for remote actions, e.g. data
management via Globus service
petrel.alcf.anl.gov
Securing JupyterHub with Globus Auth plugin
• Existing OAuth
framework
• Can restrict IdP
• Custom scopes
• Tokens passed into
notebook environment
github.com/jupyterhub/oauthenticator
REST APIs
REST APIs
REST APIs
Bearer a45cd...
Hub
Configurable HTTP proxy
Authenticator
User DB
Spawner
Notebook
/api/auth
/hub/
/user/[name]/
login
Browser
{"tokens":...
{"tokens":...
Tokens in Jupyter notebooks
The world is your
oyster API…
• Globus Transfer
• Globus Search
• Your app
• Data portal
• Analysis engine
• …
Automated data analysis/results distribution
Notebook
Data
Repository
Bearer a45cd…
Dataset
Shared
endpoint
POST '/endpoint/a3c345f... /mkdir’
200 OK
...
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Analyze
Experiment with the demo notebook
• Login into our JupyterHub*: jupyter.demo.globus.org
• Launch (spawn) a notebook server; get tokens
• Access Globus APIs; download some data
• “Analyze” data (generate plot)
• PUT results (graph) on an HTTPS endpoint
*zero-to-jupyterhub.readthedocs.io
Our (simplistic) data flow thus far…
• Adequate for ad hoc sharing (implicit knowledge)
• Broader access, reuse requires “formalization”
• Leverage Globus data publication services
Notebook
Data
Repository
Bearer a45cd…
Dataset
Shared
endpoint
POST '/endpoint/a3c345f... /mkdir’
200 OK
...
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Analyze
Globus Data Publication V1
SaaS publication
BYO Storage & in-place
publication
User-managed collections
Arbitrary metadata (with
pre-defined schema)
Handle, DOI PIDs
Adoption since 2015:
>1800 users, >600 datasets
Publication V2: A platform for automation
• Decompose data publication v1 into platform services
• Facilitate flexible re-composition, adaptation by
customers
• Enable extension and enhancement
• Initial services
– Search, identifiers (and data management)
• Future services
– Description (metadata), flows
Globus Search
• Scalable service à billions of entries
• Schema agnostic: use standard (e.g. DataCite) or custom
metadata
• Fine grained access control: only returns results that are
visible to user
• Plain text search: ranked results
• Faceted search: facilitates data discovery
• Rich query language: ranges, expressions, regex, etc.
32
docs.globus.org/api/search
Globus Identifiers
• Service for issuing persistent identifiers
– DOI, ARK, Handle, Globus
– e.g. https://identifiers.globus.org/doi:10.1145/2076450.2076468
• Within a namespace, e.g. your DataCite namespace
– Control which identities/groups can create identifiers
• Each identifier has…
– Link to data: one or more https URLs, to file, folder or manifest
– Landing page: provided by service, or by user
– Visibility: identities, groups that can see identifier
– Checksum: of the file or manifest
– Metadata: as required by identifier (e.g., DataCite), extensible
– Replaces/replaced-by: for versioning
33
Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline