SlideShare a Scribd company logo
1 of 41
Download to read offline
Vas Vasiliadis
vas@uchicago.edu
February 28, 2024
Advanced Administration Topics
Agenda
• Managing nodes in a production deployment
• GCS troubleshooting
• Customizing domains
• Managing roles on endpoints and collections
• Customizing identity mapping
• Implementing sharing policies
• Additional storage gateway options
• Performance tuning
2
Adding DTNs to your
endpoint
3
Multi-node DTN behavior
• Transfer tasks sent to nodes in round-robin fashion
• Active nodes can receive transfer tasks
• Tasks on inactive node will pause until active again
• GCS manager assistant service
– Synchronizes configuration among nodes in the endpoint
– Stores encrypted configuration values in Globus service
4
GCSv5 deployment key
5
Adding a node requires just two commands
$ sudo globus-connect-server node setup --deployment-key THE_KEY
$ sudo systemctl restart apache2
Copy the deployment key
from the first node (DTN) to
every other node
Node setup pulls configuration from Globus service
Check your DTN cluster status:
globus-connect-server node list
Migrating/refreshing
DTNs
7
Migrating an endpoint to a new host (DTN)
• An endpoints is a logical construct è replace host
system without disrupting the endpoint
– Avoid replicating configuration data (esp. for guest collections!)
– Maintain continuity for custom apps, automation scripts, etc., that
use the endpoint UUID
• 1. Add new node to endpoint à 2. remove original node
• Again, deployment key is required
– Export node configuration with node setup --export-node
– Import on new DTN using node setup --import-node
Troubleshooting
Globus Connect Server
9
Before asking for help…
• self-diagnostic can identify many issues
– Are services running? GCS manager/assistant, GridFTP server
• Connectivity is a common cause
– Can Globus connect to the GCS Manager service?
– Is the DTN control channel reachable?
– Can the DTN establish data channel connection?
docs.globus.org/globus-connect-server/v5.4/troubleshooting-guide
…and we’re always here for you: support@globus.org
10
Customizing GCS
domains
11
GCS domain configuration (default)
12
/var/lib/globus-connect-server/gcs-
manager/etc/httpd/conf.d/G_COLL_UUID
Domains on DTN
vhost: Management API
abc.abc.data.globus.org
vhost: Mapped Collection
m-abc.abc.data.globus.org
vhost: Guest Collection
g-abc.abc.data.globus.org
/var/lib/globus-connect-server/gcs-
manager/etc/httpd/conf.d/M_COLL_UUID
/var/lib/globus-connect-server/gcs-
manager/etc/httpd/conf.d/EP_UUID
Multiple (sub)domains exist in a GCS deployment
• Endpoint
– e4faec.75bc.data.globus.org
• Mapped (m-...) and guest (g-...) collections
– m-8dd2b7.e4faec.75bc.data.globus.org
– g-e7b189.e4faec.75bc.data.globus.org
• Subdomains are distinct Apache vhosts
/var/lib/globus-connect-server/gcs-manager/etc/httpd/conf.d
• Management API for the GCS Manager service is at:
https://e4faec.75bc.data.globus.org/api
Customizing GCS domains
• Set up DNS record
– Avoid using FQDN for the DTN
– activedata.uchicago.edu and *.activedata.uchicago.edu (see below)
• Put SSL certificate/key on DTN
• As endpoint owner or admin run: endpoint domain update
--domain activedata.uchicago.edu
--certificate-path ...
--private-key-path ...
--wildcard ß important; otherwise collections use data.globus.org
--managed ß really important for certs/keys to be sync’d across DTNs
• Assuming --wildcard, domains for collections will look like…
– m-8dd2b7.activedata.uchicago.edu
– g-e7b189.activedata.uchicago.edu
Managing roles
15
Subscriptions and endpoint roles
• Subscription(s) configured for an institution
• Multiple Subscription Managers per subscription
• Subscription Manager ties endpoint to subscription
– Results in a “subscribed” endpoint
• Assign additional roles for endpoint management
– Administrator, Manager, Monitor
Be identity-, role-, and permission-aware
• Default: Only endpoint owner can configure an endpoint
• Delegate administrator role to other sysadmins
– Best practice: Delegate to a Globus group, not individuals
• Check identity using the session command
• Check resource permissions on storage gateways and
collections with --include-private-policies option
docs.globus.org/globus-connect-server/v5.4/reference/role
Collection roles
• Mapped collections have same roles as endpoints
• Guest collections add “Access Manager” role
– Critical for automation
• Any Auth client can assume endpoint/collection role
– Particularly useful for scripts that manage large deployments
– e.g., script to list guest collection information:
gist.github.com/vasv/cdb8607e2bfab08634b5aa99389e87c7
• Roles may be granted to Auth (app ) clients for…
– …group management
– …flow execution/monitoring
– …any other Globus resource that has role-based access control
Customizing/extending
identity mapping
19
Mapping identities to local accounts
• Default: Strip identity domain (everything after “@”)
– e.g., userX@globusdemo.org maps to local account userX
– Best for campus identities w/synchronized local accounts
• Use --identity-mapping option on storage gateway
– Specify expression in a JSON document
– Execute a custom script
docs.globus.org/globus-connect-server/v5.4/identity-mapping-guide
Simple custom mapping example
Note: Requires the storage
gateway to accept identities
from two domains
{
"DATA_TYPE":
"expression_identity_mapping#1.0.0",
"mappings": [
{
"source": "{username}",
"match": "42032579@wassamottau.edu",
"output": "vas",
"ignore_case": false,
"literal": false
},
{
"source": "{username}",
"match": "(.*)@uchicago.edu",
"output": "{0}",
"ignore_case": false,
"literal": false
}
]
}
Otherwise, default behavior
local user à domain username
Map
42032579@wossamottau.edu
to local user vas
“Hijacking” ID mapping for autoprovisioning
• Useful in large user communities; esp. where other
automated account management processes exist
1: Get username from input identity
2: If no local user with username, create user
3: Add local user to map file
• Use sample script in docs as a starting point
22
Implementing sharing
policies
23
Sharing restrictions
• Guest collections may be created in any directory
accessible by the collection, by any authorized local
account
• You can restrict who can share…
o --sharing-user-allow
--sharing-user-deny
o --posix-sharing-group-allow
o --posix-sharing-group-deny
• …and what they can share…
o --sharing-restrict-paths (specify JSON PathRestrictions)
Restrictive/specific sharing policies
• Setting policies for specific user/path combinations
$ globus-connect-server sharing-policy create 
--user myuser –user youruser 
--read /reference --read-write /cui/mysecrets
• Sharing policies cannot override restrictions on the
underlying storage gateway
{
"DATA_TYPE": "path_restrictions#1.0.0”,
"read_write": ["/home/"],
"none": ["/cui"]
}
#FAIL
Due to storage gateway
--restrict-paths policy
Limit whom data owners can share with
• Authentication policies limit guest collection access
to identities from specific domain(s)
• Attach auth policy to mapped collection
• Explicitly include/exclude identity domains
• Domains used to filter permissions when authorizing
access to a guest collection
26
Create auth policy and attach to collection
$ globus-connect-server auth-policy create 
> --include *.edu --include globus.org 
> "Allow sharing internally" 
> "R&E Sharing Policy"
Authentication Policy ID: 45ff23ed-43a8-438c-aaa8-e8e36708756e
$ globus-connect-server collection update 
> --guest-auth-policy-id 45ff23ed-43a8-438c-aaa8-e8e36708756e 
> 56c3dff0-d827-4f11-91f3-b0704c53aa4c
Allowed sharee domains
Apply policy to this collection
Additional storage
gateway options
28
High Assurance attributes of interest
• Typical settings:
– Access limited to single authentication domain
– Auth timeout typically reflects other institutional policies
• Detailed audit logs: /var/log/gridftp-audit.log
– Recall: regular logs in /var/log/gridftp.log
• MFA requirements
– Requires that IdP provides the acr and/or amr claims
Configuring a “private” data channel
• Default: data interface is set to the DTN’s public IP
address (see data_interface in
/etc/gridftp.d/globus-connect-server)
• Create /etc/gridftp.d/STORAGE_GATEWAY_ID
• Set data_interface PRIVATE_INTERFACE_IP_ADDRESS
• Replicate on every DTN (files in /etc/gridftp.d/ are
not sync'd between nodes by Globus)
30
Supporting non-POSIX systems
• Update your GCS packages
• Add the appropriate storage gateway
– Non-POSIX systems require add-on connector subscription(s)
• Gateway configuration options vary by connector
– e.g., specify bucket name(s) for AWS S3
• Collection authentication options vary by connector
– e.g., provide user access key and secret key for AWS S3
– Credentials must grant appropriate permissions
– Mapped collection may not actually “map” to local user account
Accessing non-POSIX
systems: AWS S3
(and S3-compatible systems)
32
On performance…
33
Globus is fast (and secure, and reliable), but…
72.8Gbps
Your observed performance will depend on…
• Data Transfer Node (CPU, RAM, bus, NIC, …)
• Network (devices, path quality, latency, …)
• Storage (hardware, attach mode, …)
• Dataset make-up (file#, size, tree depth, …)
– Remember: LoSF == Great sadness
• Strange things people do (one transfer/file …1M files)
• …?
35
Interpreting reported performance
36
A more accurate
speed measurement
(expect wide variance)
“Effective” includes
service overhead
(primarily to guarantee
data integrity!)
You should have Great Expectations
37
Dataset Size Transfer Time: 1min Transfer Time: 5min Transfer Time: 20min Transfer Time: 1hr
10 PB 1,333.33 Tbps 266.67 Tbps 66.67 Tbps 22.22 Tbps
1PB 133.33 Tbps 26.67 Tbps 6.67 Tbps 2.22 Tbps
100TB 13.33 Tbps 2.67 Tbps 666.67 Gbps 222.22 Gbps
10TB 1.33 Tbps 266.67 Gbps 66.67 Gbps 22.22 Gbps
1TB 133.33 Gbps 26.67 Gbps 6.67 Gbps 2.22 Gbps
100GB 13.33 Gbps 2.67 Gbps 666.67 Mbps 222.22 Mbps
10GB 1.33 Gbps 266.67 Mbps 66.67 Mbps 22.22 Mbps
1GB 133.33 Mbps 26.67 Mbps 6.67 Mbps 2.22 Mbps
100MB 13.33 Mbps 2.67 Mbps 0.67 Mbps 0.22 Mbps
ESnet EPOC target for all DOE labs
(requires at least a 10G connection)
Science DMZ: Network configuration best practice
38
Source
security
filters
Destination
security
filters
Destination
Science DMZ
Source
Science DMZ
Source
Border Router
Destination
Border Router
Source Router Destination Router
User
Organization
DATA
CONTROL
Physical Control Path
Logical Control Path
Physical Data Path
Logical Data Path
Port 443
(configurable)
Ports 50000-51000*
(default)
Data Transfer
Node (DTN)
Data Transfer
Node (DTN)
* Not actively listening; only used when transfer is in progress; may be restricted to private network
Please see TCP ports reference: https://docs.globus.org/resource-provider-guide/#open-tcp-ports_section
ESnet
makes
magic
happen
Globus transfer performance is a team sport
• Network use parameters: concurrency, parallelism
• Maximum, Preferred values for each
• Transfer considers source and destination endpoint settings
min(
max(preferred src, preferred dest),
max src,
max dest
)
• Also be aware of pipelining effects
• Service limits, e.g. concurrent requests
40
Globus network use parameters
• May only be changed on managed endpoints
• Modify via the web app: Console à Endpoints tab
• Modify via Globus Connect Server CLI
– Run globus-connect-server endpoint modify
• Strong recommendation: Do not change network use
parameters before establishing baseline performance
41

More Related Content

Similar to Advanced Globus System Administration Topics

Similar to Advanced Globus System Administration Topics (20)

Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)
 
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Globus for System Administrators (GlobusWorld Tour - Columbia University)
Globus for System Administrators (GlobusWorld Tour - Columbia University)Globus for System Administrators (GlobusWorld Tour - Columbia University)
Globus for System Administrators (GlobusWorld Tour - Columbia University)
 
Tutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System AdministratorsTutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System Administrators
 
Globus Endpoint Migration and Advanced Administration Topics
Globus Endpoint Migration and Advanced Administration TopicsGlobus Endpoint Migration and Advanced Administration Topics
Globus Endpoint Migration and Advanced Administration Topics
 
Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus Endpoint Administration (GlobusWorld Tour - STFC)Globus Endpoint Administration (GlobusWorld Tour - STFC)
Globus Endpoint Administration (GlobusWorld Tour - STFC)
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)
 
Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
 
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
 
Globus Connect Server v5 Q&A Briefing
Globus Connect Server v5 Q&A BriefingGlobus Connect Server v5 Q&A Briefing
Globus Connect Server v5 Q&A Briefing
 
Azure storage deep dive
Azure storage deep diveAzure storage deep dive
Azure storage deep dive
 
Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)
 

More from Globus

Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
Globus
 

More from Globus (20)

The Department of Energy's Integrated Research Infrastructure (IRI).pdf
The Department of Energy's Integrated Research Infrastructure (IRI).pdfThe Department of Energy's Integrated Research Infrastructure (IRI).pdf
The Department of Energy's Integrated Research Infrastructure (IRI).pdf
 
Research Automation with Globus Flows.pptx
Research Automation with Globus Flows.pptxResearch Automation with Globus Flows.pptx
Research Automation with Globus Flows.pptx
 
Reactive Documents and Computational Pipelines
Reactive Documents and Computational PipelinesReactive Documents and Computational Pipelines
Reactive Documents and Computational Pipelines
 
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
GlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote AddressGlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote Address
 
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) WorkflowsGlobus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
 
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
 
Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)
 
Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
 
Enhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptxEnhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptx
 
Enhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdfEnhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
 
Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 

Recently uploaded

Recently uploaded (20)

[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...
Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...
Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 

Advanced Globus System Administration Topics

  • 1. Vas Vasiliadis vas@uchicago.edu February 28, 2024 Advanced Administration Topics
  • 2. Agenda • Managing nodes in a production deployment • GCS troubleshooting • Customizing domains • Managing roles on endpoints and collections • Customizing identity mapping • Implementing sharing policies • Additional storage gateway options • Performance tuning 2
  • 3. Adding DTNs to your endpoint 3
  • 4. Multi-node DTN behavior • Transfer tasks sent to nodes in round-robin fashion • Active nodes can receive transfer tasks • Tasks on inactive node will pause until active again • GCS manager assistant service – Synchronizes configuration among nodes in the endpoint – Stores encrypted configuration values in Globus service 4
  • 6. Adding a node requires just two commands $ sudo globus-connect-server node setup --deployment-key THE_KEY $ sudo systemctl restart apache2 Copy the deployment key from the first node (DTN) to every other node Node setup pulls configuration from Globus service Check your DTN cluster status: globus-connect-server node list
  • 8. Migrating an endpoint to a new host (DTN) • An endpoints is a logical construct è replace host system without disrupting the endpoint – Avoid replicating configuration data (esp. for guest collections!) – Maintain continuity for custom apps, automation scripts, etc., that use the endpoint UUID • 1. Add new node to endpoint à 2. remove original node • Again, deployment key is required – Export node configuration with node setup --export-node – Import on new DTN using node setup --import-node
  • 10. Before asking for help… • self-diagnostic can identify many issues – Are services running? GCS manager/assistant, GridFTP server • Connectivity is a common cause – Can Globus connect to the GCS Manager service? – Is the DTN control channel reachable? – Can the DTN establish data channel connection? docs.globus.org/globus-connect-server/v5.4/troubleshooting-guide …and we’re always here for you: support@globus.org 10
  • 12. GCS domain configuration (default) 12 /var/lib/globus-connect-server/gcs- manager/etc/httpd/conf.d/G_COLL_UUID Domains on DTN vhost: Management API abc.abc.data.globus.org vhost: Mapped Collection m-abc.abc.data.globus.org vhost: Guest Collection g-abc.abc.data.globus.org /var/lib/globus-connect-server/gcs- manager/etc/httpd/conf.d/M_COLL_UUID /var/lib/globus-connect-server/gcs- manager/etc/httpd/conf.d/EP_UUID
  • 13. Multiple (sub)domains exist in a GCS deployment • Endpoint – e4faec.75bc.data.globus.org • Mapped (m-...) and guest (g-...) collections – m-8dd2b7.e4faec.75bc.data.globus.org – g-e7b189.e4faec.75bc.data.globus.org • Subdomains are distinct Apache vhosts /var/lib/globus-connect-server/gcs-manager/etc/httpd/conf.d • Management API for the GCS Manager service is at: https://e4faec.75bc.data.globus.org/api
  • 14. Customizing GCS domains • Set up DNS record – Avoid using FQDN for the DTN – activedata.uchicago.edu and *.activedata.uchicago.edu (see below) • Put SSL certificate/key on DTN • As endpoint owner or admin run: endpoint domain update --domain activedata.uchicago.edu --certificate-path ... --private-key-path ... --wildcard ß important; otherwise collections use data.globus.org --managed ß really important for certs/keys to be sync’d across DTNs • Assuming --wildcard, domains for collections will look like… – m-8dd2b7.activedata.uchicago.edu – g-e7b189.activedata.uchicago.edu
  • 16. Subscriptions and endpoint roles • Subscription(s) configured for an institution • Multiple Subscription Managers per subscription • Subscription Manager ties endpoint to subscription – Results in a “subscribed” endpoint • Assign additional roles for endpoint management – Administrator, Manager, Monitor
  • 17. Be identity-, role-, and permission-aware • Default: Only endpoint owner can configure an endpoint • Delegate administrator role to other sysadmins – Best practice: Delegate to a Globus group, not individuals • Check identity using the session command • Check resource permissions on storage gateways and collections with --include-private-policies option docs.globus.org/globus-connect-server/v5.4/reference/role
  • 18. Collection roles • Mapped collections have same roles as endpoints • Guest collections add “Access Manager” role – Critical for automation • Any Auth client can assume endpoint/collection role – Particularly useful for scripts that manage large deployments – e.g., script to list guest collection information: gist.github.com/vasv/cdb8607e2bfab08634b5aa99389e87c7 • Roles may be granted to Auth (app ) clients for… – …group management – …flow execution/monitoring – …any other Globus resource that has role-based access control
  • 20. Mapping identities to local accounts • Default: Strip identity domain (everything after “@”) – e.g., userX@globusdemo.org maps to local account userX – Best for campus identities w/synchronized local accounts • Use --identity-mapping option on storage gateway – Specify expression in a JSON document – Execute a custom script docs.globus.org/globus-connect-server/v5.4/identity-mapping-guide
  • 21. Simple custom mapping example Note: Requires the storage gateway to accept identities from two domains { "DATA_TYPE": "expression_identity_mapping#1.0.0", "mappings": [ { "source": "{username}", "match": "42032579@wassamottau.edu", "output": "vas", "ignore_case": false, "literal": false }, { "source": "{username}", "match": "(.*)@uchicago.edu", "output": "{0}", "ignore_case": false, "literal": false } ] } Otherwise, default behavior local user à domain username Map 42032579@wossamottau.edu to local user vas
  • 22. “Hijacking” ID mapping for autoprovisioning • Useful in large user communities; esp. where other automated account management processes exist 1: Get username from input identity 2: If no local user with username, create user 3: Add local user to map file • Use sample script in docs as a starting point 22
  • 24. Sharing restrictions • Guest collections may be created in any directory accessible by the collection, by any authorized local account • You can restrict who can share… o --sharing-user-allow --sharing-user-deny o --posix-sharing-group-allow o --posix-sharing-group-deny • …and what they can share… o --sharing-restrict-paths (specify JSON PathRestrictions)
  • 25. Restrictive/specific sharing policies • Setting policies for specific user/path combinations $ globus-connect-server sharing-policy create --user myuser –user youruser --read /reference --read-write /cui/mysecrets • Sharing policies cannot override restrictions on the underlying storage gateway { "DATA_TYPE": "path_restrictions#1.0.0”, "read_write": ["/home/"], "none": ["/cui"] } #FAIL Due to storage gateway --restrict-paths policy
  • 26. Limit whom data owners can share with • Authentication policies limit guest collection access to identities from specific domain(s) • Attach auth policy to mapped collection • Explicitly include/exclude identity domains • Domains used to filter permissions when authorizing access to a guest collection 26
  • 27. Create auth policy and attach to collection $ globus-connect-server auth-policy create > --include *.edu --include globus.org > "Allow sharing internally" > "R&E Sharing Policy" Authentication Policy ID: 45ff23ed-43a8-438c-aaa8-e8e36708756e $ globus-connect-server collection update > --guest-auth-policy-id 45ff23ed-43a8-438c-aaa8-e8e36708756e > 56c3dff0-d827-4f11-91f3-b0704c53aa4c Allowed sharee domains Apply policy to this collection
  • 29. High Assurance attributes of interest • Typical settings: – Access limited to single authentication domain – Auth timeout typically reflects other institutional policies • Detailed audit logs: /var/log/gridftp-audit.log – Recall: regular logs in /var/log/gridftp.log • MFA requirements – Requires that IdP provides the acr and/or amr claims
  • 30. Configuring a “private” data channel • Default: data interface is set to the DTN’s public IP address (see data_interface in /etc/gridftp.d/globus-connect-server) • Create /etc/gridftp.d/STORAGE_GATEWAY_ID • Set data_interface PRIVATE_INTERFACE_IP_ADDRESS • Replicate on every DTN (files in /etc/gridftp.d/ are not sync'd between nodes by Globus) 30
  • 31. Supporting non-POSIX systems • Update your GCS packages • Add the appropriate storage gateway – Non-POSIX systems require add-on connector subscription(s) • Gateway configuration options vary by connector – e.g., specify bucket name(s) for AWS S3 • Collection authentication options vary by connector – e.g., provide user access key and secret key for AWS S3 – Credentials must grant appropriate permissions – Mapped collection may not actually “map” to local user account
  • 32. Accessing non-POSIX systems: AWS S3 (and S3-compatible systems) 32
  • 34. Globus is fast (and secure, and reliable), but… 72.8Gbps
  • 35. Your observed performance will depend on… • Data Transfer Node (CPU, RAM, bus, NIC, …) • Network (devices, path quality, latency, …) • Storage (hardware, attach mode, …) • Dataset make-up (file#, size, tree depth, …) – Remember: LoSF == Great sadness • Strange things people do (one transfer/file …1M files) • …? 35
  • 36. Interpreting reported performance 36 A more accurate speed measurement (expect wide variance) “Effective” includes service overhead (primarily to guarantee data integrity!)
  • 37. You should have Great Expectations 37 Dataset Size Transfer Time: 1min Transfer Time: 5min Transfer Time: 20min Transfer Time: 1hr 10 PB 1,333.33 Tbps 266.67 Tbps 66.67 Tbps 22.22 Tbps 1PB 133.33 Tbps 26.67 Tbps 6.67 Tbps 2.22 Tbps 100TB 13.33 Tbps 2.67 Tbps 666.67 Gbps 222.22 Gbps 10TB 1.33 Tbps 266.67 Gbps 66.67 Gbps 22.22 Gbps 1TB 133.33 Gbps 26.67 Gbps 6.67 Gbps 2.22 Gbps 100GB 13.33 Gbps 2.67 Gbps 666.67 Mbps 222.22 Mbps 10GB 1.33 Gbps 266.67 Mbps 66.67 Mbps 22.22 Mbps 1GB 133.33 Mbps 26.67 Mbps 6.67 Mbps 2.22 Mbps 100MB 13.33 Mbps 2.67 Mbps 0.67 Mbps 0.22 Mbps ESnet EPOC target for all DOE labs (requires at least a 10G connection)
  • 38. Science DMZ: Network configuration best practice 38 Source security filters Destination security filters Destination Science DMZ Source Science DMZ Source Border Router Destination Border Router Source Router Destination Router User Organization DATA CONTROL Physical Control Path Logical Control Path Physical Data Path Logical Data Path Port 443 (configurable) Ports 50000-51000* (default) Data Transfer Node (DTN) Data Transfer Node (DTN) * Not actively listening; only used when transfer is in progress; may be restricted to private network Please see TCP ports reference: https://docs.globus.org/resource-provider-guide/#open-tcp-ports_section
  • 40. Globus transfer performance is a team sport • Network use parameters: concurrency, parallelism • Maximum, Preferred values for each • Transfer considers source and destination endpoint settings min( max(preferred src, preferred dest), max src, max dest ) • Also be aware of pipelining effects • Service limits, e.g. concurrent requests 40
  • 41. Globus network use parameters • May only be changed on managed endpoints • Modify via the web app: Console à Endpoints tab • Modify via Globus Connect Server CLI – Run globus-connect-server endpoint modify • Strong recommendation: Do not change network use parameters before establishing baseline performance 41