Making Storage Systems Accessible via Globus (GlobusWorld Tour West)Globus
1. Globus Connect Server software makes storage systems accessible via Globus by installing software on the storage system that connects it to the Globus network.
2. To set up access, you first register a Globus Connect Server, install the software, set up an endpoint, and create a storage gateway and mapped collection.
3. You can then associate the endpoint with a Globus subscription to manage access and share data by creating guest collections.
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus
This document provides an overview of how to create and manage Globus endpoints. It discusses installing and configuring Globus Connect Server to set up an endpoint on an Amazon EC2 server. The key steps are to install Globus Connect Server, run the setup process, and configure options in the configuration file like making the endpoint public or enabling sharing. Advanced configuration topics covered include using host certificates, single sign-on with CILogon, restricting file paths for transfers and sharing, and setting up multiple Globus Connect Server instances for load balancing.
Globus Command Line Interface (APS Workshop)Globus
The document provides information about using the Globus Command Line Interface (CLI) to automate data transfers and sharing. It discusses installing the CLI and some basic commands like searching for endpoints, listing files, and doing transfers. It also covers more advanced topics like managing permissions, batch transfers, notifications, and examples of automation scripts that use the CLI to move data between endpoints and share it with other users based on permissions. The final section walks through an example of using a shell script to automate the process of moving data from an instrument to a shared guest collection and setting permissions for another user to access it.
Globus is a non-profit data management service that allows users to transfer, share, and access data across different storage systems and platforms through software-as-a-service. It has transferred over 1.34 exabytes of data and aims to unify access to research data across different tiers of storage through connectors, APIs, and user interfaces. Globus ensures secure data transfers and sharing by using user identities, access controls, encryption, and audit logging without storing user credentials or data.
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobus
An introduction to the Globus command line interface and the SDK for accessing Globus platform services. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Simple Data Automation with Globus (GlobusWorld Tour West)Globus
Greg Nawrocki's document discusses how Globus provides tools for data automation programming including a command line interface, timer service, REST APIs, and Python SDK. These tools allow users to create integrated ecosystems of research data services and applications while managing security and authentication through Globus Auth. Specific examples are given for using the command line interface, timer service, REST APIs, and Python SDK to automate tasks like file transfers, scheduled jobs, and accessing endpoints. Resources for learning more and code examples are also provided.
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)Globus
1. Globus Connect Server software makes storage systems accessible via Globus by installing software on the storage system that connects it to the Globus network.
2. To set up access, you first register a Globus Connect Server, install the software, set up an endpoint, and create a storage gateway and mapped collection.
3. You can then associate the endpoint with a Globus subscription to manage access and share data by creating guest collections.
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus
This document provides an overview of how to create and manage Globus endpoints. It discusses installing and configuring Globus Connect Server to set up an endpoint on an Amazon EC2 server. The key steps are to install Globus Connect Server, run the setup process, and configure options in the configuration file like making the endpoint public or enabling sharing. Advanced configuration topics covered include using host certificates, single sign-on with CILogon, restricting file paths for transfers and sharing, and setting up multiple Globus Connect Server instances for load balancing.
Globus Command Line Interface (APS Workshop)Globus
The document provides information about using the Globus Command Line Interface (CLI) to automate data transfers and sharing. It discusses installing the CLI and some basic commands like searching for endpoints, listing files, and doing transfers. It also covers more advanced topics like managing permissions, batch transfers, notifications, and examples of automation scripts that use the CLI to move data between endpoints and share it with other users based on permissions. The final section walks through an example of using a shell script to automate the process of moving data from an instrument to a shared guest collection and setting permissions for another user to access it.
Globus is a non-profit data management service that allows users to transfer, share, and access data across different storage systems and platforms through software-as-a-service. It has transferred over 1.34 exabytes of data and aims to unify access to research data across different tiers of storage through connectors, APIs, and user interfaces. Globus ensures secure data transfers and sharing by using user identities, access controls, encryption, and audit logging without storing user credentials or data.
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobus
An introduction to the Globus command line interface and the SDK for accessing Globus platform services. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Simple Data Automation with Globus (GlobusWorld Tour West)Globus
Greg Nawrocki's document discusses how Globus provides tools for data automation programming including a command line interface, timer service, REST APIs, and Python SDK. These tools allow users to create integrated ecosystems of research data services and applications while managing security and authentication through Globus Auth. Specific examples are given for using the command line interface, timer service, REST APIs, and Python SDK to automate tasks like file transfers, scheduled jobs, and accessing endpoints. Resources for learning more and code examples are also provided.
Introduction to Globus (GlobusWorld Tour West)Globus
This document introduces Globus, which provides fast and reliable data transfer, sharing, and platform services across different storage systems and resources. It does this through software-as-a-service that uses existing user identities, with the goal of unifying access to data across different tiers like HPC, storage, cloud, and personal resources. Key features include secure data transfers without moving files, access control and sharing capabilities, and tools for building automations and integrating with science gateways. It also discusses options for handling protected data like health information with additional security controls and business agreements.
This document provides information about installing and configuring Globus Connect Server (GCS) version 4. It discusses how GCS makes local storage accessible via Globus, installing GCS on an Amazon EC2 instance, common configuration options like restricting file paths and enabling sharing, and using subscriptions to create managed endpoints.
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
An introduction to the core features of the Globus data management service. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Globus
This document provides an overview of Globus Connect Server (GCS) for system administrators. It discusses GCS versioning and which version to use based on subscription and feature needs. It then walks through installing GCS on an Amazon EC2 server instance and configuring an endpoint. The document covers common configuration options like restricting access paths, enabling sharing, and identity providers. It also discusses managed endpoints, subscriptions, and the management console. Finally, it presents some deployment scenarios and options like distributing GCS components, encryption, and the Globus Network Manager.
Globus for System Administrators (GlobusWorld Tour - Columbia University)Globus
This document provides information about installing and configuring Globus Connect Server. It begins with an overview of Globus Connect Server and what it allows administrators to do. It then walks through steps to install Globus Connect Server on an Amazon EC2 instance, create an endpoint, and perform a test file transfer. The document also covers additional configuration options for Globus Connect Server like restricting access paths, enabling sharing, and using different authorization methods. It demonstrates how to view and manage endpoints and activity using the Globus management console. Finally, it discusses various deployment scenarios and best practices for Globus Connect Server.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
1) The Globus platform provides services for fast and reliable data transfer, sharing, and file management directly from storage systems via software-as-a-service using existing identities.
2) Globus can be used as a platform for building science gateways, portals and other web applications in support of research through APIs for user authentication, file transfer, and sharing capabilities.
3) The document provides an introduction to the Globus platform and its capabilities including code samples and walks through using the APIs via a Jupyter notebook to search for endpoints, manage files and tasks, and integrate Globus into other applications.
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Globus
This document discusses how to leverage the Globus platform in web applications. It describes the Globus platform as providing secure and reliable data orchestration and transfer capabilities. It outlines several key Globus services like Auth, Transfer, and Helper Pages and provides examples of how they can be used to build applications. It also summarizes the Globus APIs and provides code examples of accessing endpoints, submitting tasks, and more.
Jupyter + Globus: The Foundation for Interactive Data ScienceGlobus
This tutorial from the Gateways 2018 conference in Austin, TX showed participants how Globus may be used in conjunction with the Jupyter platform to open up new avenues—and new data sources--for interactive data science.
Introduction to Globus: Research Data Management Software at the ALCFGlobus
This document provides an introduction and overview of Globus, a research data management platform. It discusses how Globus can be used to move, share, discover, and reproduce data across different storage tiers and resources. Globus delivers fast and reliable big data transfer, sharing, and platform services directly from existing storage systems via software-as-a-service using existing identities, with the goal of unifying access to data across different locations and resources. The document demonstrates how Globus can be used via its web interface, command line interface, REST API, and as a platform for building other research applications and workflows.
HBaseConEast2016: Practical Kerberos with Apache HBaseMichael Stack
- The document is a slide presentation on practical Kerberos with Apache HBase given by Josh Elser of Hortonworks.
- It provides an introduction to Kerberos, how it is used for authentication in HBase and Hadoop, and best practices for configuration and troubleshooting common issues.
- Key aspects covered include how Kerberos tickets and keytabs are used, the SASL and GSSAPI protocols that enable authenticated RPC, and approaches like delegation tokens and proxy users that handle special cases like long-running jobs.
Web pages are served through HTTP and viewed in browsers. Crawlers fetch pages to build indexes. They start from seed URLs and recursively fetch new URLs found on pages. Crawlers face challenges like overlapping delays, avoiding duplicates, handling redirects, and preventing crawler traps. Large-scale crawlers employ techniques like multi-threading, non-blocking sockets, distributed storage, and estimating change rates to efficiently crawl billions of pages across the web.
Tutorial: Automating Research Data WorkflowsGlobus
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
IPFS is a distribution protocol that enables the creation of completely distributed applications through content addressing. A very ambitious open source project in Go, IPFS adopts a peer-to-peer hypermedia protocol to protect against a single point of failure. This presentation aims to highlight the design and ideas of IPFS and also touches upon a real world use case.
Globus for System Administrators (CHPC 2019 - South Africa)Globus
This document provides an overview of Globus Connect Server (GCS) for system administrators. It discusses the different versions of GCS and considerations for which version to use. It then covers how to install and configure GCS on a server, including creating an endpoint, restricting access paths, enabling sharing, and using single sign-on. Monitoring and managing endpoints through the management console is also addressed. Finally, various deployment scenarios and best practices are reviewed, such as Science DMZ networking, distributing GCS components, and advanced configurations.
Globus Endpoint Administration (GlobusWorld Tour - STFC)Globus
This document provides instructions for installing and configuring Globus Connect Server on an Amazon EC2 instance to create a Globus endpoint. It discusses logging into the server, installing Globus Connect Server using apt-get, and running the setup process. The document also covers accessing the newly created endpoint on Globus and transferring a test file. It provides guidance on configuring the endpoint, including making it publicly visible, restricting file paths, and enabling sharing.
Introduction to Globus (GlobusWorld Tour West)Globus
This document introduces Globus, which provides fast and reliable data transfer, sharing, and platform services across different storage systems and resources. It does this through software-as-a-service that uses existing user identities, with the goal of unifying access to data across different tiers like HPC, storage, cloud, and personal resources. Key features include secure data transfers without moving files, access control and sharing capabilities, and tools for building automations and integrating with science gateways. It also discusses options for handling protected data like health information with additional security controls and business agreements.
This document provides information about installing and configuring Globus Connect Server (GCS) version 4. It discusses how GCS makes local storage accessible via Globus, installing GCS on an Amazon EC2 instance, common configuration options like restricting file paths and enabling sharing, and using subscriptions to create managed endpoints.
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
An introduction to the core features of the Globus data management service. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Globus
This document provides an overview of Globus Connect Server (GCS) for system administrators. It discusses GCS versioning and which version to use based on subscription and feature needs. It then walks through installing GCS on an Amazon EC2 server instance and configuring an endpoint. The document covers common configuration options like restricting access paths, enabling sharing, and identity providers. It also discusses managed endpoints, subscriptions, and the management console. Finally, it presents some deployment scenarios and options like distributing GCS components, encryption, and the Globus Network Manager.
Globus for System Administrators (GlobusWorld Tour - Columbia University)Globus
This document provides information about installing and configuring Globus Connect Server. It begins with an overview of Globus Connect Server and what it allows administrators to do. It then walks through steps to install Globus Connect Server on an Amazon EC2 instance, create an endpoint, and perform a test file transfer. The document also covers additional configuration options for Globus Connect Server like restricting access paths, enabling sharing, and using different authorization methods. It demonstrates how to view and manage endpoints and activity using the Globus management console. Finally, it discusses various deployment scenarios and best practices for Globus Connect Server.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
1) The Globus platform provides services for fast and reliable data transfer, sharing, and file management directly from storage systems via software-as-a-service using existing identities.
2) Globus can be used as a platform for building science gateways, portals and other web applications in support of research through APIs for user authentication, file transfer, and sharing capabilities.
3) The document provides an introduction to the Globus platform and its capabilities including code samples and walks through using the APIs via a Jupyter notebook to search for endpoints, manage files and tasks, and integrate Globus into other applications.
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Globus
This document discusses how to leverage the Globus platform in web applications. It describes the Globus platform as providing secure and reliable data orchestration and transfer capabilities. It outlines several key Globus services like Auth, Transfer, and Helper Pages and provides examples of how they can be used to build applications. It also summarizes the Globus APIs and provides code examples of accessing endpoints, submitting tasks, and more.
Jupyter + Globus: The Foundation for Interactive Data ScienceGlobus
This tutorial from the Gateways 2018 conference in Austin, TX showed participants how Globus may be used in conjunction with the Jupyter platform to open up new avenues—and new data sources--for interactive data science.
Introduction to Globus: Research Data Management Software at the ALCFGlobus
This document provides an introduction and overview of Globus, a research data management platform. It discusses how Globus can be used to move, share, discover, and reproduce data across different storage tiers and resources. Globus delivers fast and reliable big data transfer, sharing, and platform services directly from existing storage systems via software-as-a-service using existing identities, with the goal of unifying access to data across different locations and resources. The document demonstrates how Globus can be used via its web interface, command line interface, REST API, and as a platform for building other research applications and workflows.
HBaseConEast2016: Practical Kerberos with Apache HBaseMichael Stack
- The document is a slide presentation on practical Kerberos with Apache HBase given by Josh Elser of Hortonworks.
- It provides an introduction to Kerberos, how it is used for authentication in HBase and Hadoop, and best practices for configuration and troubleshooting common issues.
- Key aspects covered include how Kerberos tickets and keytabs are used, the SASL and GSSAPI protocols that enable authenticated RPC, and approaches like delegation tokens and proxy users that handle special cases like long-running jobs.
Web pages are served through HTTP and viewed in browsers. Crawlers fetch pages to build indexes. They start from seed URLs and recursively fetch new URLs found on pages. Crawlers face challenges like overlapping delays, avoiding duplicates, handling redirects, and preventing crawler traps. Large-scale crawlers employ techniques like multi-threading, non-blocking sockets, distributed storage, and estimating change rates to efficiently crawl billions of pages across the web.
Tutorial: Automating Research Data WorkflowsGlobus
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
IPFS is a distribution protocol that enables the creation of completely distributed applications through content addressing. A very ambitious open source project in Go, IPFS adopts a peer-to-peer hypermedia protocol to protect against a single point of failure. This presentation aims to highlight the design and ideas of IPFS and also touches upon a real world use case.
Globus for System Administrators (CHPC 2019 - South Africa)Globus
This document provides an overview of Globus Connect Server (GCS) for system administrators. It discusses the different versions of GCS and considerations for which version to use. It then covers how to install and configure GCS on a server, including creating an endpoint, restricting access paths, enabling sharing, and using single sign-on. Monitoring and managing endpoints through the management console is also addressed. Finally, various deployment scenarios and best practices are reviewed, such as Science DMZ networking, distributing GCS components, and advanced configurations.
Globus Endpoint Administration (GlobusWorld Tour - STFC)Globus
This document provides instructions for installing and configuring Globus Connect Server on an Amazon EC2 instance to create a Globus endpoint. It discusses logging into the server, installing Globus Connect Server using apt-get, and running the setup process. The document also covers accessing the newly created endpoint on Globus and transferring a test file. It provides guidance on configuring the endpoint, including making it publicly visible, restricting file paths, and enabling sharing.
Advanced Globus System Administration TopicsGlobus
We cover topics of interest to system administrators, such as managing multi-DTN endpoints, mapping user identities, and using custom domains for data access.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
An overview of installing and configuring a Globus endpoints on your storage system. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
Introduction to Globus for System AdministratorsGlobus
We review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating and configuring a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system.
We review the Globus Connect Server architecture and deployment model, and describe the process for creating a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system. You will learn how to install Globus Connect Server, and configure a number of common options on the endpoint.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
Introduction to Globus for System AdministratorsGlobus
We review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating and configuring a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system.
Introduction to Globus for System AdministratorsGlobus
We provide a detailed walkthrough of installing and configuring a Globus endpoint and creating storage gateways and collections to enable data access. We also review other features directed at system administrators, such as the console, usage reporting, performance tuning, and role management/permissions delegation.
Presented at a workshop at KU Leuven on July 8, 2022.
We discuss and demonstrate various advanced features such as managing multi-DTN endpoints, customized mapping of user identities, public/private deployment configurations, tweaking file transfer performance, and differences/migrating between Globus Connect Server versions 4 and 5.
Introduction to Globus for System AdministratorsGlobus
We review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system. You will experiment with installing Globus Connect Server, and configuring basic options on the endpoint.
We discuss and demonstrate various advanced features such as managing multi-DTN endpoints, customized mapping of user identities, public/private deployment configurations, tweaking file transfer performance, and differences/migrating between Globus Connect Server versions 4 and 5.
We review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system. Includes exercises for installing Globus Connect Server, and configuring a number of common options on the endpoint. We also demonstrate how to monitor and manage user activity.
Presented at a workshop at Oak Ridge National Laboratory on June 22, 2022.
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
This document introduces the Globus platform for automating research data flows. It describes Globus capabilities for scheduled transfers, command line scripting, and comprehensive task orchestration. It also covers Globus Auth for identity and access management, securing apps with Globus Auth, and the Globus Timer Service. The Globus command line interface and Python SDK allow for programmatic access and automation of data transfers and other tasks.
Connecting Your System to Globus (APS Workshop)Globus
Connecting a storage system to Globus involves:
1. Registering a Globus Connect Server to get credentials.
2. Installing Globus Connect Server packages and setting up an endpoint and data transfer node.
3. Creating a POSIX storage gateway to make the storage accessible and define authentication policies.
4. Adding a mapped collection to provide access to files in the storage system via Globus.
5. Associating the endpoint with a Globus subscription to enable additional features.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
Globus Endpoint Migration and Advanced Administration TopicsGlobus
We discuss the differences between Globus Connect Server versions 4 and 5, migrating to version 5, managing multi-DTN endpoints, options for tweaking file transfer performance and other advanced topics.
Presented at a workshop at Oak Ridge National Laboratory on June 23, 2022.
Automating Research Data Flows and an Introduction to the Globus PlatformGlobus
We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service. We use a Jupyter notebook to demonstrate automation of file transfers and permissions management on shared datasets. We also provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on understanding the security model; and will demonstrate how to access Globus services via APIs for integration with custom research applications.
Presented at a workshop at Oak Ridge National Laboratory on June 23, 2022.
Building on basic Globus administration skills, we introduce topics such as using multiple data transfer nodes for your endpoint, customizing identity mapping, and using Globus connectors to access non-POSIX filesystems such as iRODS and Amazon S3.
Presented at a workshop at KU Leuven on July 8, 2022.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
06-18-2024-Princeton Meetup-Introduction to MilvusTimothy Spann
06-18-2024-Princeton Meetup-Introduction to Milvus
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
https://github.com/milvus-io/milvus
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/142-17June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications.
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Marlon Dumas
This webinar discusses the limitations of traditional approaches for business process simulation based on had-crafted model with restrictive assumptions. It shows how process mining techniques can be assembled together to discover high-fidelity digital twins of end-to-end processes from event data.
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
Globus for System Administrators (GlobusWorld Tour - UCSD)
1. Globus for System Administrators
Vas Vasiliadis
vas@uchicago.edu
UCSD – May 8, 2019
2. Globus Connect Server
3
• Makes your storage accessible via Globus
• Multi-user server, installed and managed by sysadmin
docs.globus.org/globus-connect-server-installation-guide/
Local system users
Local Storage System
(HPC cluster, NAS, …)
Globus Connect Server
MyProxy
CA
GridFTP
Server
OAuth
Server
DTN
• Default access for
all local accounts
• Native packaging
Linux: DEB, RPM
3. Globus Connect Server
4
Local system users
Local Storage System
(HPC cluster, NAS, …)
Globus Connect Server
MyProxy
CA
GridFTP
Server
OAuth
Server
DTN
Non-POSIX Connectors
POSIX-compliant Connector
server
5. By default, assume you
should use GCS v4
Which version of Globus Connect Server do I use?
6. Which version of Globus Connect Server do I use?
Are you a Globus
subscriber?
Do you have a High Assurance or BAA
subscription (and plan to set up a high
assurance endpoint)?
Use
GCS v4
Use
GCS v5.3
No
Yes
No
Yes
Would you like to support multiple storage
system types on one endpoint?
Do you want to enable HTTPS access to
your storage?
Do you plan to use the S3, Box, Google
Drive or Ceph connectors?
No
No
No
Yes
Yes
Yes
7. Creating a Globus endpoint on your server
• In this example, Server = Amazon EC2 instance
• Installation and configuration of Globus Connect
Server requires a Globus ID
• Go to globusid.org
• Click “create a Globus ID”
– Optional: associate it with your Globus account
8
8. What we are going to do:
Install Globus Connect Server
• Access server as user “campusadmin”
• Update repo
• Install package
• Setup Globus Connect Server
Server
(AWS EC2)
ssh
ec2-22-23-24-25
Log into Globus
Transfer a file
1
2
3
Access the newly created
endpoint (as user ‘researcher’)
4
9
9. Access your server
• Get the IP address for your EC2 server (bit.ly/ec2ip)
• Log in as user ‘campusadmin’
ssh campusadmin@<EC2_instance_IP_address>
• Please sudo su before continuing
– User ‘campusadmin’ has passwordless sudo privileges
10
10. Install Globus Connect Server
$ sudo su
$ curl –LOs
http://downloads.globus.org/toolkit/globus-connect-
server/globus-connect-server-repo_latest_all.deb
$ dpkg –i globus-connect-server-repo_latest_all.deb
$ apt-get update
$ apt-get -y install globus-connect-server
$ globus-connect-server-setup
You have a working Globus endpoint!
Use your Globus ID username and
password when prompted
11. Access the Globus endpoint
• Go to Manage Data à Transfer Files
• Access the endpoint you just created
– Search for your EC2 host name in the Endpoint field
– Log in as “researcher”; you will see the user’s home directory
• Transfer files between a test endpoint (e.g. ESnet
read-only) and your EC2 endpoint
12
12. Globus accounts and endpoint access
• Globus account: Primary identity (+ Linked Identities)
• Endpoint initially accessible by creator
• Endpoint not visible?
– Primary identity is your institutional ID?
– Link your Globus ID!
14. Endpoint configuration
• On the Globus service: app.globus.org/endpoints
• On your DTN: /etc/globus-connect-server.conf
– Standard .ini format: [Section] Option = Value
– To enable changes run globus-connect-server-setup
– “Rinse and repeat”
15
16. Exercise: Make your endpoint visible
• Edit endpoint attributes
– Change the name to something useful, e.g. <your_name> EC2
Endpoint
– For the “Visible To” attribute select “Public - Visible to all users”
• Find your neighbor’s endpoint
– Thanks to our superb security …you can access it too J
17
17. Path Restriction
• Default configuration:
– All paths allowed, access control handled by the OS
• Use RestrictPaths to customize
– Specifies a comma separated list of full paths that clients may access
– Each path may be prefixed by R (read) and/or W (write), or N (none) to explicitly
deny access to a path
– '~’ for authenticated user’s home directory, and * may be used for simple
wildcard matching.
• e.g. Full access to home directory, read access to /data:
– RestrictPaths = RW~,R/data
• e.g. Full access to home directory, deny hidden files:
– RestrictPaths = RW~,N~/.*
18
18. Exercise: Restrict access
• Set RestrictPaths=RW~,N~/archive
• Run globus-connect-server-setup
• Access your endpoint as ‘researcher’
• What’s changed?
19
19. • In config file, set Sharing=True
• Run globus-connect-server-setup
• Flag endpoint as “managed” (in web app or via CLI)
* Note: Creation of shared endpoints requires a
Globus subscription for the managed endpoint
Enabling sharing on an endpoint
20
20. Limit sharing to specific accounts
• SharingUsersAllow =
• SharingGroupsAllow =
• SharingUsersDeny =
• SharingGroupsDeny =
21
21. Sharing Path Restriction
• Restrict paths where users can create shared endpoints
• Use SharingRestrictPaths to customize
– Same syntax as RestrictPaths
• e.g. Full access to home directory, deny hidden files:
– SharingRestrictPaths = RW~,N~/.*
• e.g. Full access to public folder under home directory:
– SharingRestrictPaths = RW~/public
• e.g. Full access to /proj, read access to /scratch:
– SharingRestrictPaths = RW/proj,R/scratch
22
26. Single Sign-On with InCommon/CILogon
• Your Shibboleth server must release R&S attributes to
CILogon—especially the ePPN attribute
• Local account must match institutional ID (InCommon ID)
– Test by creating a local user with same name
• In /etc/globus-connect-server.conf set:
AuthorizationMethod = CILogon
CILogonIdentityProvider =
<institution_listed_in_CILogon_IdP_list>
27
28. Subscription configuration
• Subscription manager
– Create/upgrade managed endpoints
– Requires Globus ID linked to Globus account
• Management console permissions
– Independent of subscription manager
– Map managed endpoint to Globus ID
• Globus Plus group
– Subscription Manager is admin
– Can grant admin rights to other members
29
29. Creating managed endpoints
• Required for sharing, management console, reporting, …
• Convert existing endpoint to managed via CLI (or web):
globus endpoint update --managed <endpt_uuid>
• Must be run by subscription manager
• Important: Re-run endpoint update after deleting/re-
creating endpoint
30
35. Balance: performance - reliability
• Network use parameters: concurrency, parallelism
• Maximum, Preferred values for each
• Transfer considers source and destination endpoint settings
min(
max(preferred src, preferred dest),
max src,
max dest
)
• Service limits, e.g. concurrent requests
36
40. Current best practice
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Portal server applications:
· web server
· search
· database
· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
41. Science DMZ configuration
42
Source
security
filters
Destination
security
filters
Destination
Science DMZ
Source
Science DMZ
Source
Border Router
Destination
Border Router
Source Router Destination Router
User
Organization
DATA
CONTROL
Physical Control Path
Logical Control Path
Physical Data Path
Logical Data Path
* Ports 443,
2811, 7512
* Ports 50000-
51000
Data Transfer
Node (DTN)
Data Transfer
Node (DTN)
* Please see TCP ports reference: https://docs.globus.org/resource-provider-guide/#open-tcp-ports_section
46. Network paths
• Separate control and data interfaces
• "DataInterface =" option in globus-connect-server-
conf
• Common scenario: route data flows over Science
DMZ link
48
47. Dual-homed DTN – high speed data path
Data
Transfer
Node
GridFTP
Server
Science DMZ
Control
Channel
Data
Transfer
Node
GridFTP
Server
Data Channel
if0
if1
Internet2
path
Control
Channel
48. Dual-homed DTN – internal data path
Data
Transfer
Node
GridFTP
Server
Science DMZ
Control
Channel
Data
Transfer
Node
GridFTP
Server
Data Channel
if0
if1
LAN/
Intranet
path
Control
Channel
Firewall
if0
if1
50. Encryption
• Requiring encryption on an endpoint
– User cannot override
– Useful for “sensitive” data
• Globus uses OpenSSL cipher stack as currently
configured on your DTN
• FIPS 140-2 compliance: ensure use of FIPS capable
OpenSSL libraries on DTN
www.openssl.org/docs/fips/UserGuide-2.0.pdf
52
51. Distributing Globus Connect Server components
• Globus Connect Server components
– globus-connect-server-io, -id, -web
• Default: -io, –id and –web on single server
• Common options
– Multiple –io servers for load balancing, failover, and
performance
– No -id server, e.g. third-party IdP
– -id on separate server, e.g. non-DTN nodes
– -web on either –id server or separate server for OAuth interface
53
52. ext*
XFS
ZFS
Distributing Globus Connect Server components
Data
Transfer
Node
OAuth
Server
GridFTP
Server
MyProxy
CA
Science DMZ
(ACL limited)
Port 2811
accepts inbound
connections
from Globus
Firewall
53. Setting up multiple –io servers
• Guidelines
– Use the same .conf file on all servers
– First install on the server running the –id component, then all others
• Install Globus Connect Server on all servers
• Edit .conf file on one of the servers and set [MyProxy] Server to the hostname
of the server you want the –id component installed on
• Copy Globus Connect Server configuration file to all servers
• Run globus-connect-server-setup on the server running the –id component
• Run globus-connect-server-setup on all other servers
• Repeat steps 2-5 as necessary to update configurations
55
54. Example: Two-node DTN
56
-id
-io
-io
On other DTN nodes:
/etc/globus-connect-server.conf
[Endpoint] Name = globus_dtn
[MyProxy] Server = 34.20.29.57
On “primary” DTN node (34.20.29.57):
/etc/globus-connect-server.conf
[Endpoint] Name = globus_dtn
[MyProxy] Server = 34.20.29.57