We provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on building simple web applications for data distribution and discovery. We describe how to register an application with Globus and access platform APIs using the Globus Python SDK and a Jupyter Notebook. We also introduce the Globus Search service and demonstrate how it is used by an open source web portal framework that can jumpstart research application development.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
2. Globus PaaS accelerates development
• Auth
• Groups
• Transfer
• Compute
• Search
• Timer
• Flows
• GCS Manager
• Globus web app consumes the
same public APIs
• Resources named by URL
(standard REST approach)
• Request/response body is JSON
• Python SDK available; JS coming
docs.globus.org/api
3.
4. Globus Auth: Foundational IAM service
• Makes app/portal widely and easily accessible
• Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
5
5. Fundamental Concepts
• Scopes
– APIs that client is requesting access to
– Service and resources within that service
• Consents
– Authorizes a client to access a service, within limited scope, on
the resource owner's behalf
• Multiple methods for user to grant consent depending
on the type of application
6
6. Several authentication modes supported
• A portal/science/gateway or other application you host
– Authorization code grant: authenticate as user identity
– Browser redirect; auth code returned automatically; tokens stored securely
• A thick client/script installed and run on the user’s device
– Native app grant: authenticate as user identity
– Auth code returned automatically; tokens stored per installation
• Service account or application credentials for automation
– Client credentials grant: authenticate as application identity
– Client ID and Secret stored securely
• Application able to manage tokens for offline/long lived tasks
– Request refresh tokens in addition to access tokens
9. Managing service accounts/app credentials
• Application Identity:
appclientid@clients.auth.globus.org
• These are confidential apps with client id and secret
• Ensure application is on a secure device
• Set up policy for rotation of secret
• Assign project admins to manage the registration
10. Globus Groups for authorization
• Grant group manager role to your application (client identity)
– App can add/remove members to grant/remove access
• Use groups to manage your application’s permissions
– Use group membership instead of managing ACLs directly
– Check user’s group membership(s) to determine permissions
globus-sdk-python.readthedocs.io/en/stable/services/groups.html
11. Guest collections simplify data access
• Users don't need local accounts to access data
• Portal can act as itself when accessing collections
• Grant the application Access Manager role
– Allows the application to manage permissions on the collection
• Grant roles for management of endpoint and tasks
12. Globus Search: Data description and discovery
• Metadata store with fine-
grained visibility controls
• Schema agnostic dynamic
schemas
• Simple search using URL
query parameters
• Complex search using
search request document
13
docs.globus.org/api/search
Search
Index
13. Data ingest with Globus Search
14
Search
Index
POST /index/{index_id}/ingest'
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "filetype",
"subject”: "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"metadata-schema/file#type": "file”
}
},
...
]
}
- Bulk create and update
- Task model for ingest at scale
14. Data ingest with Globus Search
15
Search
Index
POST /index/{index_id}/ingest'
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": ”weight",
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["urn:globus:auth:identity:46bd0f56-
e24f-11e5-a510-131bef46955c"],
"content": {
"metadata-schema/file#size": ”37.6",
"metadata-schema/file#size_human": ”<50lb”
}
},
...
]
}
Visibility limited to Globus Auth identity
- Single user
- Globus Group
- Registered client application
20. Legacy Architecture (don’t do this)
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Data path
Portal server applications:
· web server
· search
· database
· authentication
· data service
Science gateway server applications
Science
gateway
server
Gateway logic (“small” data) and
research data traverse the enterprise
firewall è massive bottleneck
21. Best practice: ScienceDMZ
10GE
10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Portal server applications:
· web server
· search
· database
· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
Science
gateway
server
Science gateway server applications
Only gateway logic (“small” data)
traverses the enterprise firewall
è fast, clean path for research data
22. MRDP: Key elements
Science DMZ
Fast, clean data path
Data Transfer Nodes
Purpose-built data movers
Globus Platform
Secure, reliable data
orchestration
Globus Connect
Storage system enabler
23
Globus Portal
Framework
Data discovery and access
23. Django Globus Portal key features
• Federated login via Globus Auth
• Data export via Globus Transfer
• Browse datasets via Globus Search
• Template-driven search results and landing pages
• Django-based framework with extensible templates
• Bootstrap your project using Cookiecutter
25
Source: github.com/globus/django-globus-portal-framework
Docs: django-globus-portal-framework.readthedocs.io
24. Portal framework + Compute = Science Gateway
• Discover and select data of interest
• Run analyses on selected data via Globus Compute
– Run on any resource: laptop à supercomputer
• Move (and, optionally, share) analysis results
• Do all this reliably, at scale with Globus Flows
26