We cover topics of interest to system administrators, such as managing multi-DTN endpoints, mapping user identities, and using custom domains for data access.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
4. Multi-node DTN behavior
• Transfer tasks sent to nodes in round-robin fashion
• Active nodes can receive transfer tasks
• Tasks on inactive node will pause until active again
• GCS manager assistant service
– Synchronizes configuration among nodes in the endpoint
– Stores encrypted configuration values in Globus service
4
6. Adding a node requires just two commands
$ sudo globus-connect-server node setup --deployment-key THE_KEY
$ sudo systemctl restart apache2
Copy the deployment key
from the first node (DTN) to
every other node
Node setup pulls configuration from Globus service
Check your DTN cluster status:
globus-connect-server node list
8. Migrating an endpoint to a new host (DTN)
• An endpoints is a logical construct è replace host
system without disrupting the endpoint
– Avoid replicating configuration data (esp. for guest collections!)
– Maintain continuity for custom apps, automation scripts, etc., that
use the endpoint UUID
• 1. Add new node to endpoint à 2. remove original node
• Again, deployment key is required
– Export node configuration with node setup --export-node
– Import on new DTN using node setup --import-node
10. Before asking for help…
• self-diagnostic can identify many issues
– Are services running? GCS manager/assistant, GridFTP server
• Connectivity is a common cause
– Can Globus connect to the GCS Manager service?
– Is the DTN control channel reachable?
– Can the DTN establish data channel connection?
docs.globus.org/globus-connect-server/v5.4/troubleshooting-guide
…and we’re always here for you: support@globus.org
10
13. Multiple (sub)domains exist in a GCS deployment
• Endpoint
– e4faec.75bc.data.globus.org
• Mapped (m-...) and guest (g-...) collections
– m-8dd2b7.e4faec.75bc.data.globus.org
– g-e7b189.e4faec.75bc.data.globus.org
• Subdomains are distinct Apache vhosts
/var/lib/globus-connect-server/gcs-manager/etc/httpd/conf.d
• Management API for the GCS Manager service is at:
https://e4faec.75bc.data.globus.org/api
14. Customizing GCS domains
• Set up DNS record
– Avoid using FQDN for the DTN
– activedata.uchicago.edu and *.activedata.uchicago.edu (see below)
• Put SSL certificate/key on DTN
• As endpoint owner or admin run: endpoint domain update
--domain activedata.uchicago.edu
--certificate-path ...
--private-key-path ...
--wildcard ß important; otherwise collections use data.globus.org
--managed ß really important for certs/keys to be sync’d across DTNs
• Assuming --wildcard, domains for collections will look like…
– m-8dd2b7.activedata.uchicago.edu
– g-e7b189.activedata.uchicago.edu
16. Subscriptions and endpoint roles
• Subscription(s) configured for an institution
• Multiple Subscription Managers per subscription
• Subscription Manager ties endpoint to subscription
– Results in a “subscribed” endpoint
• Assign additional roles for endpoint management
– Administrator, Manager, Monitor
17. Be identity-, role-, and permission-aware
• Default: Only endpoint owner can configure an endpoint
• Delegate administrator role to other sysadmins
– Best practice: Delegate to a Globus group, not individuals
• Check identity using the session command
• Check resource permissions on storage gateways and
collections with --include-private-policies option
docs.globus.org/globus-connect-server/v5.4/reference/role
18. Collection roles
• Mapped collections have same roles as endpoints
• Guest collections add “Access Manager” role
– Critical for automation
• Any Auth client can assume endpoint/collection role
– Particularly useful for scripts that manage large deployments
– e.g., script to list guest collection information:
gist.github.com/vasv/cdb8607e2bfab08634b5aa99389e87c7
• Roles may be granted to Auth (app ) clients for…
– …group management
– …flow execution/monitoring
– …any other Globus resource that has role-based access control
20. Mapping identities to local accounts
• Default: Strip identity domain (everything after “@”)
– e.g., userX@globusdemo.org maps to local account userX
– Best for campus identities w/synchronized local accounts
• Use --identity-mapping option on storage gateway
– Specify expression in a JSON document
– Execute a custom script
docs.globus.org/globus-connect-server/v5.4/identity-mapping-guide
21. Simple custom mapping example
Note: Requires the storage
gateway to accept identities
from two domains
{
"DATA_TYPE":
"expression_identity_mapping#1.0.0",
"mappings": [
{
"source": "{username}",
"match": "42032579@wassamottau.edu",
"output": "vas",
"ignore_case": false,
"literal": false
},
{
"source": "{username}",
"match": "(.*)@uchicago.edu",
"output": "{0}",
"ignore_case": false,
"literal": false
}
]
}
Otherwise, default behavior
local user à domain username
Map
42032579@wossamottau.edu
to local user vas
22. “Hijacking” ID mapping for autoprovisioning
• Useful in large user communities; esp. where other
automated account management processes exist
1: Get username from input identity
2: If no local user with username, create user
3: Add local user to map file
• Use sample script in docs as a starting point
22
24. Sharing restrictions
• Guest collections may be created in any directory
accessible by the collection, by any authorized local
account
• You can restrict who can share…
o --sharing-user-allow
--sharing-user-deny
o --posix-sharing-group-allow
o --posix-sharing-group-deny
• …and what they can share…
o --sharing-restrict-paths (specify JSON PathRestrictions)
25. Restrictive/specific sharing policies
• Setting policies for specific user/path combinations
$ globus-connect-server sharing-policy create
--user myuser –user youruser
--read /reference --read-write /cui/mysecrets
• Sharing policies cannot override restrictions on the
underlying storage gateway
{
"DATA_TYPE": "path_restrictions#1.0.0”,
"read_write": ["/home/"],
"none": ["/cui"]
}
#FAIL
Due to storage gateway
--restrict-paths policy
26. Limit whom data owners can share with
• Authentication policies limit guest collection access
to identities from specific domain(s)
• Attach auth policy to mapped collection
• Explicitly include/exclude identity domains
• Domains used to filter permissions when authorizing
access to a guest collection
26
29. High Assurance attributes of interest
• Typical settings:
– Access limited to single authentication domain
– Auth timeout typically reflects other institutional policies
• Detailed audit logs: /var/log/gridftp-audit.log
– Recall: regular logs in /var/log/gridftp.log
• MFA requirements
– Requires that IdP provides the acr and/or amr claims
30. Configuring a “private” data channel
• Default: data interface is set to the DTN’s public IP
address (see data_interface in
/etc/gridftp.d/globus-connect-server)
• Create /etc/gridftp.d/STORAGE_GATEWAY_ID
• Set data_interface PRIVATE_INTERFACE_IP_ADDRESS
• Replicate on every DTN (files in /etc/gridftp.d/ are
not sync'd between nodes by Globus)
30
31. Supporting non-POSIX systems
• Update your GCS packages
• Add the appropriate storage gateway
– Non-POSIX systems require add-on connector subscription(s)
• Gateway configuration options vary by connector
– e.g., specify bucket name(s) for AWS S3
• Collection authentication options vary by connector
– e.g., provide user access key and secret key for AWS S3
– Credentials must grant appropriate permissions
– Mapped collection may not actually “map” to local user account
34. Globus is fast (and secure, and reliable), but…
72.8Gbps
35. Your observed performance will depend on…
• Data Transfer Node (CPU, RAM, bus, NIC, …)
• Network (devices, path quality, latency, …)
• Storage (hardware, attach mode, …)
• Dataset make-up (file#, size, tree depth, …)
– Remember: LoSF == Great sadness
• Strange things people do (one transfer/file …1M files)
• …?
35
36. Interpreting reported performance
36
A more accurate
speed measurement
(expect wide variance)
“Effective” includes
service overhead
(primarily to guarantee
data integrity!)
37. You should have Great Expectations
37
Dataset Size Transfer Time: 1min Transfer Time: 5min Transfer Time: 20min Transfer Time: 1hr
10 PB 1,333.33 Tbps 266.67 Tbps 66.67 Tbps 22.22 Tbps
1PB 133.33 Tbps 26.67 Tbps 6.67 Tbps 2.22 Tbps
100TB 13.33 Tbps 2.67 Tbps 666.67 Gbps 222.22 Gbps
10TB 1.33 Tbps 266.67 Gbps 66.67 Gbps 22.22 Gbps
1TB 133.33 Gbps 26.67 Gbps 6.67 Gbps 2.22 Gbps
100GB 13.33 Gbps 2.67 Gbps 666.67 Mbps 222.22 Mbps
10GB 1.33 Gbps 266.67 Mbps 66.67 Mbps 22.22 Mbps
1GB 133.33 Mbps 26.67 Mbps 6.67 Mbps 2.22 Mbps
100MB 13.33 Mbps 2.67 Mbps 0.67 Mbps 0.22 Mbps
ESnet EPOC target for all DOE labs
(requires at least a 10G connection)
38. Science DMZ: Network configuration best practice
38
Source
security
filters
Destination
security
filters
Destination
Science DMZ
Source
Science DMZ
Source
Border Router
Destination
Border Router
Source Router Destination Router
User
Organization
DATA
CONTROL
Physical Control Path
Logical Control Path
Physical Data Path
Logical Data Path
Port 443
(configurable)
Ports 50000-51000*
(default)
Data Transfer
Node (DTN)
Data Transfer
Node (DTN)
* Not actively listening; only used when transfer is in progress; may be restricted to private network
Please see TCP ports reference: https://docs.globus.org/resource-provider-guide/#open-tcp-ports_section
40. Globus transfer performance is a team sport
• Network use parameters: concurrency, parallelism
• Maximum, Preferred values for each
• Transfer considers source and destination endpoint settings
min(
max(preferred src, preferred dest),
max src,
max dest
)
• Also be aware of pipelining effects
• Service limits, e.g. concurrent requests
40
41. Globus network use parameters
• May only be changed on managed endpoints
• Modify via the web app: Console à Endpoints tab
• Modify via Globus Connect Server CLI
– Run globus-connect-server endpoint modify
• Strong recommendation: Do not change network use
parameters before establishing baseline performance
41