This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
This document provides information about installing and configuring Globus Connect Server (GCS) version 4. It discusses how GCS makes local storage accessible via Globus, installing GCS on an Amazon EC2 instance, common configuration options like restricting file paths and enabling sharing, and using subscriptions to create managed endpoints.
An overview of developments in the Globus platform during 2020-2021, presented at a webinar hosted by Internet2. Includes an overview of Globus Connect Server v5, cloud storage connectors, and platform services for developers (e.g., Globus Search and Globus Flows).
Introduction to the Globus Platform (APS Workshop)Globus
This document discusses the Globus Platform Services API and SDK. It provides an overview of the Globus Auth API for user authentication and file sharing capabilities. It also summarizes the Globus Transfer API and Python SDK for integrating file transfer and access management into applications. Several methods for tasks like endpoint search, file operations, task submission and management are covered at a high level.
Globus Command Line Interface (APS Workshop)Globus
The document provides information about using the Globus Command Line Interface (CLI) to automate data transfers and sharing. It discusses installing the CLI and some basic commands like searching for endpoints, listing files, and doing transfers. It also covers more advanced topics like managing permissions, batch transfers, notifications, and examples of automation scripts that use the CLI to move data between endpoints and share it with other users based on permissions. The final section walks through an example of using a shell script to automate the process of moving data from an instrument to a shared guest collection and setting permissions for another user to access it.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
This document provides information about installing and configuring Globus Connect Server (GCS) version 4. It discusses how GCS makes local storage accessible via Globus, installing GCS on an Amazon EC2 instance, common configuration options like restricting file paths and enabling sharing, and using subscriptions to create managed endpoints.
An overview of developments in the Globus platform during 2020-2021, presented at a webinar hosted by Internet2. Includes an overview of Globus Connect Server v5, cloud storage connectors, and platform services for developers (e.g., Globus Search and Globus Flows).
Introduction to the Globus Platform (APS Workshop)Globus
This document discusses the Globus Platform Services API and SDK. It provides an overview of the Globus Auth API for user authentication and file sharing capabilities. It also summarizes the Globus Transfer API and Python SDK for integrating file transfer and access management into applications. Several methods for tasks like endpoint search, file operations, task submission and management are covered at a high level.
Globus Command Line Interface (APS Workshop)Globus
The document provides information about using the Globus Command Line Interface (CLI) to automate data transfers and sharing. It discusses installing the CLI and some basic commands like searching for endpoints, listing files, and doing transfers. It also covers more advanced topics like managing permissions, batch transfers, notifications, and examples of automation scripts that use the CLI to move data between endpoints and share it with other users based on permissions. The final section walks through an example of using a shell script to automate the process of moving data from an instrument to a shared guest collection and setting permissions for another user to access it.
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
An overview of installing and configuring a Globus endpoints on your storage system. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
Connecting Your System to Globus (APS Workshop)Globus
Connecting a storage system to Globus involves:
1. Registering a Globus Connect Server to get credentials.
2. Installing Globus Connect Server packages and setting up an endpoint and data transfer node.
3. Creating a POSIX storage gateway to make the storage accessible and define authentication policies.
4. Adding a mapped collection to provide access to files in the storage system via Globus.
5. Associating the endpoint with a Globus subscription to enable additional features.
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobus
An introduction to the Globus command line interface and the SDK for accessing Globus platform services. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Globus is a non-profit data management service that allows users to transfer, share, and access data across different storage systems and platforms through software-as-a-service. It has transferred over 1.34 exabytes of data and aims to unify access to research data across different tiers of storage through connectors, APIs, and user interfaces. Globus ensures secure data transfers and sharing by using user identities, access controls, encryption, and audit logging without storing user credentials or data.
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
The document discusses automating research data workflows using the Globus Command Line Interface (CLI). Key points include:
1) The CLI can be used to automate recurring transfers, stage data in/out of compute jobs, and allow applications to submit transfers on a user's behalf by using access tokens.
2) Refresh tokens allow applications to get new access tokens without a user logged in, by storing and exchanging refresh tokens.
3) Examples of automation include syncing directories with scripts, staging data in shared directories, and removing directories after transfer with Python scripts.
4) Resources for support include Globus documentation, sample code, and professional services for custom application development and integration.
Introduction to Globus (GlobusWorld Tour West)Globus
This document introduces Globus, which provides fast and reliable data transfer, sharing, and platform services across different storage systems and resources. It does this through software-as-a-service that uses existing user identities, with the goal of unifying access to data across different tiers like HPC, storage, cloud, and personal resources. Key features include secure data transfers without moving files, access control and sharing capabilities, and tools for building automations and integrating with science gateways. It also discusses options for handling protected data like health information with additional security controls and business agreements.
This document provides an overview of Globus Data Portals and the Django Globus Portal Framework. Globus Data Portals allow users to securely discover, access, and act on research data through a customizable web interface. The Django Globus Portal Framework enables researchers to quickly build portals by leveraging Globus services like Search, Transfer, and Auth for features like authenticated access, data discovery, and integration with custom applications. It discusses key portal capabilities, architecture, and provides a link to demo building a new portal using the framework and example repo.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)Globus
1. Globus Connect Server software makes storage systems accessible via Globus by installing software on the storage system that connects it to the Globus network.
2. To set up access, you first register a Globus Connect Server, install the software, set up an endpoint, and create a storage gateway and mapped collection.
3. You can then associate the endpoint with a Globus subscription to manage access and share data by creating guest collections.
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
1) The Globus platform provides services for fast and reliable data transfer, sharing, and file management directly from storage systems via software-as-a-service using existing identities.
2) Globus can be used as a platform for building science gateways, portals and other web applications in support of research through APIs for user authentication, file transfer, and sharing capabilities.
3) The document provides an introduction to the Globus platform and its capabilities including code samples and walks through using the APIs via a Jupyter notebook to search for endpoints, manage files and tasks, and integrate Globus into other applications.
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
We describe the large-scale data transfer scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform large-scale data transfers, and walk through a code repository with the web application’s code.
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGlobus
We describe the automated data ingest scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform automated data ingest and present a faceted search interface that can be used by science gateways to simplify data discovery. We also walk through the application's GitHub repository and highlight relevant components.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
Major research instruments are generating orders of magnitude more data in relatively short timeframes. As a result, the research enterprise is increasingly challenged by what should be mundane tasks: describing data for discovery and making data securely accessible to the broader research community. The ad hoc methods currently employed place undue burden on scientists and system administrators alike, and it is clear that a more robust, scalable approach is required.
Bespoke data portals (and science gateways/data commons) are becoming more prominent as a means of enabling access to large datasets. in this tutorial we demonstrate how services for authentication, authorization, metadata management, and search may be integrated with popular web frameworks, and used in combination with fast, well-architected networks to make data discoverable and accessible. Outcomes: build a simple, but functional, data portal that facilitates flexible data description, faceted data search and secure data access.
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
We describe the requirements for, and challenges of, distributing datasets at scale, e.g. from instruments such as CryoEM and advanced light sources. We demonstrate a web application that uses Globus to perform large-scale data distribution. We introduce and walk through a Jupyter notebook highlighting the relevant code to incorporate into a science gateway.
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
An introduction to the core features of the Globus data management service. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
An overview of installing and configuring a Globus endpoints on your storage system. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
Connecting Your System to Globus (APS Workshop)Globus
Connecting a storage system to Globus involves:
1. Registering a Globus Connect Server to get credentials.
2. Installing Globus Connect Server packages and setting up an endpoint and data transfer node.
3. Creating a POSIX storage gateway to make the storage accessible and define authentication policies.
4. Adding a mapped collection to provide access to files in the storage system via Globus.
5. Associating the endpoint with a Globus subscription to enable additional features.
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobus
An introduction to the Globus command line interface and the SDK for accessing Globus platform services. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Globus is a non-profit data management service that allows users to transfer, share, and access data across different storage systems and platforms through software-as-a-service. It has transferred over 1.34 exabytes of data and aims to unify access to research data across different tiers of storage through connectors, APIs, and user interfaces. Globus ensures secure data transfers and sharing by using user identities, access controls, encryption, and audit logging without storing user credentials or data.
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
The document discusses automating research data workflows using the Globus Command Line Interface (CLI). Key points include:
1) The CLI can be used to automate recurring transfers, stage data in/out of compute jobs, and allow applications to submit transfers on a user's behalf by using access tokens.
2) Refresh tokens allow applications to get new access tokens without a user logged in, by storing and exchanging refresh tokens.
3) Examples of automation include syncing directories with scripts, staging data in shared directories, and removing directories after transfer with Python scripts.
4) Resources for support include Globus documentation, sample code, and professional services for custom application development and integration.
Introduction to Globus (GlobusWorld Tour West)Globus
This document introduces Globus, which provides fast and reliable data transfer, sharing, and platform services across different storage systems and resources. It does this through software-as-a-service that uses existing user identities, with the goal of unifying access to data across different tiers like HPC, storage, cloud, and personal resources. Key features include secure data transfers without moving files, access control and sharing capabilities, and tools for building automations and integrating with science gateways. It also discusses options for handling protected data like health information with additional security controls and business agreements.
This document provides an overview of Globus Data Portals and the Django Globus Portal Framework. Globus Data Portals allow users to securely discover, access, and act on research data through a customizable web interface. The Django Globus Portal Framework enables researchers to quickly build portals by leveraging Globus services like Search, Transfer, and Auth for features like authenticated access, data discovery, and integration with custom applications. It discusses key portal capabilities, architecture, and provides a link to demo building a new portal using the framework and example repo.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)Globus
1. Globus Connect Server software makes storage systems accessible via Globus by installing software on the storage system that connects it to the Globus network.
2. To set up access, you first register a Globus Connect Server, install the software, set up an endpoint, and create a storage gateway and mapped collection.
3. You can then associate the endpoint with a Globus subscription to manage access and share data by creating guest collections.
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
1) The Globus platform provides services for fast and reliable data transfer, sharing, and file management directly from storage systems via software-as-a-service using existing identities.
2) Globus can be used as a platform for building science gateways, portals and other web applications in support of research through APIs for user authentication, file transfer, and sharing capabilities.
3) The document provides an introduction to the Globus platform and its capabilities including code samples and walks through using the APIs via a Jupyter notebook to search for endpoints, manage files and tasks, and integrate Globus into other applications.
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
We describe the large-scale data transfer scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform large-scale data transfers, and walk through a code repository with the web application’s code.
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGlobus
We describe the automated data ingest scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform automated data ingest and present a faceted search interface that can be used by science gateways to simplify data discovery. We also walk through the application's GitHub repository and highlight relevant components.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
Major research instruments are generating orders of magnitude more data in relatively short timeframes. As a result, the research enterprise is increasingly challenged by what should be mundane tasks: describing data for discovery and making data securely accessible to the broader research community. The ad hoc methods currently employed place undue burden on scientists and system administrators alike, and it is clear that a more robust, scalable approach is required.
Bespoke data portals (and science gateways/data commons) are becoming more prominent as a means of enabling access to large datasets. in this tutorial we demonstrate how services for authentication, authorization, metadata management, and search may be integrated with popular web frameworks, and used in combination with fast, well-architected networks to make data discoverable and accessible. Outcomes: build a simple, but functional, data portal that facilitates flexible data description, faceted data search and secure data access.
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
We describe the requirements for, and challenges of, distributing datasets at scale, e.g. from instruments such as CryoEM and advanced light sources. We demonstrate a web application that uses Globus to perform large-scale data distribution. We introduce and walk through a Jupyter notebook highlighting the relevant code to incorporate into a science gateway.
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
An introduction to the core features of the Globus data management service. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Simplified Research Data Management with the Globus PlatformGlobus
Overview of the Globus research data management platform, as presented at the Fall 2018 Membership Meeting of the Coalition for Networked Information (CNI), held in Washington, D.C., December 10-11, 2018
We provide a summary review of Globus features targeted at those new to Globus. We demonstrate how to transfer and share data, and install a Globus Connect Personal endpoint on your laptop.
Globus is a non-profit service that aims to increase research efficiency by unifying access to disparate storage systems and simplifying secure data sharing. It allows users to easily, securely, and reliably transfer data between different resources like HPC systems, cloud storage, instruments, and personal computers. Globus also provides APIs and SDKs to help researchers build data-centric applications and automate workflows. Funding comes partly from government grants, with subscriptions enabling additional features and supporting ongoing operations.
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Globus
This document provides an introduction to Globus for new users. It discusses how Globus can be used to move, share, describe, discover, and reproduce research data across different storage locations and computing resources. It highlights key Globus features like transferring files securely between endpoints, sharing data with collaborators, and building applications and services using the Globus APIs. The document also covers sustainability topics like Globus usage metrics, subscription plans, and support resources.
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus
Globus is a non-profit data management platform developed by the University of Chicago to increase the efficiency of data-driven research. It allows researchers to easily transfer large datasets, securely share data with collaborators across different storage systems, and automate the movement of data from instruments and compute facilities. Globus sees growing adoption with over 120 subscribers and has transferred over 768 petabytes of data.
We demonstrate Globus capabilities from the perspectives of an end-user researcher. This is an introduction to the many features of the Globus web application and command line interface. Beyond simple file transfers, you will learn how to securely share data with collaborators, set up personal endpoints and manage linked identities for unified access to storage resources.
Presented at a workshop at KU Leuven on July 8, 2022.
Introduction to Globus: Research Data Management Software at the ALCFGlobus
This document provides an introduction and overview of Globus, a research data management platform. It discusses how Globus can be used to move, share, discover, and reproduce data across different storage tiers and resources. Globus delivers fast and reliable big data transfer, sharing, and platform services directly from existing storage systems via software-as-a-service using existing identities, with the goal of unifying access to data across different locations and resources. The document demonstrates how Globus can be used via its web interface, command line interface, REST API, and as a platform for building other research applications and workflows.
We provide a summary review of Globus features targeted at those new to Globus. We demonstrate how to transfer and share data, and install a Globus Connect Personal endpoint on your laptop.
Presented at a workshop at Oak Ridge National Laboratory on June 22, 2022.
Introduction to Globus (GlobusWorld Tour - UMich)Globus
This document provides an agenda for a Globus World Tour event taking place on Monday and Tuesday. On Monday, there will be sessions on introductions to Globus for new and administrative users. On Tuesday, sessions will focus on developing with Globus, including building research data portals, automating workflows, and working with instrument data. The document also provides background information on Globus and how it aims to make research data movement, sharing, and synchronization easy, reliable and secure for researchers.
Introduces the Globus software-as-a-service for file transfer and data sharing. Includes step-by-step instructions for creating a Globus account, transferring a file, and setting up a Globus endpoint on your laptop.
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Globus
This document summarizes a presentation about the Globus data management platform. It includes an agenda covering an introduction to the Globus Software as a Service and Platform as a Service, automating research data workflows, facilitating collaboration, and building services. There are demonstrations of file transfers, data sharing, publication, and high assurance endpoints. The sustainability model is discussed, with standard and high assurance subscriptions, branded websites, premium storage connectors, and identity providers. Support resources like documentation, email lists, and professional services are also mentioned.
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Globus
This document discusses how Globus services can facilitate collaboration and data sharing through automated workflows. It describes how Globus Auth enables authentication for shared access to endpoints. APIs and command line tools allow applications to programmatically manage permissions and transfer data. JupyterHub can be configured with Globus Auth to provide tokens for accessing remote Globus services within notebooks. This enables collaborative and distributed data analysis. The document also outlines how Globus services can support automated publication of datasets through search, identifiers, and metadata.
Introduction to Data Transfer and Sharing for ResearchersGlobus
We will provide a summary review of Globus features targeted at those new to Globus. We will present various use cases that illustrate the power of Globus data sharing capabilities.
Managing Protected and Controlled Data with Globus Globus
This document discusses how Globus can be used to manage protected and controlled data with high assurance. It describes features for restricting data handling according to standards like NIST 800-53 and 800-171. Compliance focuses on access control, configuration management, maintenance, and accountability. Restricted data passed to Globus does not include file contents. The initial release includes a new web app, Globus Connect Server v5.2, and Connect Personal. High assurance capabilities include additional authentication, application instance isolation, encryption, and detailed auditing. Subscription levels like High Assurance and BAA provide these features.
We provide an overview of the Globus platform features, and demonstrate several data management features. Serving as an introductory session suitable for new users, we use the Globus web app to show data transfer and sharing, use of Globus Connect Personal for laptop/desktop access and introduce the Globus Command Line Interface for interactive and scripting.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
3. 3
Globus delivers…
Fast and reliable big data transfer,
sharing, and platform services…
…directly from your own storage
systems…
...via software-as-a-service using
existing identities with the overarching
goal of...
4. 4
Research Computing HPC
Desktop Workstations
Mass Storage Instruments
Personal Resources
Public Cloud
National Resources
Unifying access to data across tiers
9. Globus SaaS / PaaS: Research data lifecycle
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Automating research
workflows and
ensuring those that
need access to the
data have it.
8
Personal Computer
Transfer
Share
• Use a Web browser or
platform services
• Access any storage
• Use an existing identity
Build
The Globus
Command Line
Interface, API sets,
and Python SDK
provide a platform…
6
… for building
science gateways,
portals and
publication services.
7
10. Conceptual architecture: Hybrid SaaS
DATA
Channel
CONTROL
Channel
Source
Endpoint
Destination
Endpoint
Subscriber owned
and administered
storage system
Globus
“client” software
No data relay or
staging via Globus
Subscriber
Control
Domain
Globus
Control
Domain
Single, globally accessible
multi-tenant service
User
accessing
the Globus
Web App via
a browser
13. Endpoints (Collections)
• Storage abstraction
– All transfers happen between two endpoints
– Globus Connect instantiates endpoints
• Collection ~= Endpoint
• Test / Demo Endpoints
– Globus Tutorial Endpoint 1
– Globus Tutorial Endpoint 2
– ESnet Test Endpoints
o Contain file samples of various sizes
• Globus Connect Personal
– Now your laptop is an endpoint
– https://www.globus.org/globus-connect-personal
14
14. Globus Connect Personal
• Installers do not require admin access
• Zero configuration; auto updating
• Handles NATs
• Installs in seconds – easy to delete - I’ll prove it!
15. Almost demo time…
• How do I get a Globus account?
– A Globus Account is
o A Primary Identity
o Possible Linked Identities
– Your existing institutional identity may already work
– Linking / Managing Identities
– Consents
• But I don’t have any Endpoints (Collections)!
– Globus Connect Personal
– Globus Tutorial Endpoint 1
– Globus Tutorial Endpoint 2
– ESnet Test Endpoints
o Contain file samples of various sizes
16
16. Demo time!
Identities and
Accounts Transfer
Sharing
Transfer Details
Bookmarks
The Console
The Hamburger
Menu
The Activity Monitor
Groups
Roles
Responsive
Interface
17. Globus Command Line Interface
Open source, uses
Python SDK
docs.globus.org/cli
github.com/globus/
globus-cli
18. Globus Auth API
(Group Management)
…
GlobusTransferAPI
GlobusConnect
Data Discovery
File Sharing
File Transfer & Replication
Globus Platform-as-a-Service
Use existing institutional
ID systems in external
web applications
Integrate file transfer and sharing
capabilities into scientific web
apps, portals, gateways, etc...
19. A bit of Globus history
U . S . D E P A R T M E N T O F
ENERGY
20. Globus sustainability model
• Standard Subscription
– Shared endpoints
– Management console
– Usage reporting
– Priority support
– Application integration
– HTTPS support
– Branded Web Site
• Premium Storage Connectors
• Alternate Identity Provider (InCommon is standard)
21
22. Globus by the numbers...
6764
active shared
endpoints
115
subscribers
887+ PB
moved
23,818
active personal
endpoints
94 billion
files processed
1866
active server
endpoints
80
countries where
Globus is used
2.9 PB
largest single
transfer to date
99.9%
availability
745
identity providers
1960
most shared
endpoints
at a single
institution 112,105
total users
23. Manage Protected Data
24
Higher assurance levels for HIPAA and other regulated data
• Support for protected data
such as health related
information
• Share data with collaborators
while meeting compliance
requirements
• Includes BAA option
24. Globus for high assurance data management
• Restricted data handling
– PII (Personally identifiable information)
– Controlled Unclassified Information
– PHI (Protected Health Information)
• University of Chicago security controls
– NIST 800-53 Low
– Superset of 800-171 Low
• Business Associate Agreements (BAA) will be between
University of Chicago and our subscribers
– University of Chicago has a BAA with Amazon
25. High Assurance features
• Additional authentication assurance
– Per storage gateway policy on frequency of authentication with
specific identity for access to data (timeout)
– Ensure that user authenticates with the specific identity that
gives them access within session (decoupling linked identities)
• Session/device isolation
– Authentication context is per application, per session (~browser
session)
• Enforces encryption of all user data in transit
• Audit logging
26. Support resources
• Globus documentation: docs.globus.org
• Helpdesk and issue escalation: support@globus.org
• Mailing Lists
– https://www.globus.org/mailing-lists
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
27. Globus on your Campus
• Webinars
• Programs
– Helping you evangelize Globus within your
institution.
• Professional Services
• Globus World Tour
– Taking the show on the road.
28
Editor's Notes
We’ve all seen this. Some of us have been the cause of this.
Actually fairly effective. That is until we need to…. <click>
Big is crossed out because:
One man’s ceiling is another man’s floor.
Why just big data? You don’t need convenience and reliability for ALL data?
Unify – Where / Who / What / How
Where
Who
What
How
Transfer – More data quickly, securely and reliably over optimized networks. Set and forget.
Share – Transfer data across security enclaves between different “accounts”.
No ”local account” use existing institutional credentials.
Point out no tertiary storage (Dropbox) and the cost and security concerns associated with them.
Pretty powerful stuff using just a web browser.
The Globus service is a controller
No data passes through the Globus Service
Fire and forget control – The Service GUI (web page) can go away
Globus abstracts storage systems in a quanta called an ”endpoint”
Storage system complexities are masked or abstracted
Transfers between disparate storage systems is natural
This is a simple transfer case – a single user has permissions on both source and destination filesystems.
The Globus service is a controller
No data passes through the Globus Service
Fire and forget control – The Service GUI (web page) can go away
Globus abstracts storage systems in a quanta called an ”endpoint”
Storage system complexities are masked or abstracted
Transfers between disparate storage systems is natural
This is a simple transfer case – a single user has permissions on both source and destination filesystems.
Storage system complexities such as permissions are abstracted as well.
Endpoint definition
Endpoints you can use right now
GCP – Your very own endpoint, no DTN running Globus Connect Server needed
We will demo this in a minute
Delete old version – don’t forget to blow away the old endpoint
Install new version
Set a directory
Talk about Globus Plus and the shareable check box
Endpoint definition
Endpoints you can use right now
GCP – Your very own endpoint, no DTN running Globus Connect Server needed
We will demo this in a minute
Written in Python and open source – go get it an hack it up if you like.
Highly scriptable – integrate it into you bash scripts for easy RDM automation. Rachana will cover.
HOW
The fundamentals of the practical Globus research data management tools you’ve seen in our web app (File transfer, file sharing, data publication) are available to you through our set of APIs.
<click>
In this talk I’m going to concentrate more on the implementation of the transfer API set, that give you the ability to Integrate file transfer and sharing capabilities into scientific web apps, portals, gateways, etc.
<click>
Rachana and Steve are going to dig into Globus Auth in far more depth than I will, but I’ll touch on Globus Auth a bit in the context of the demos I’m going to do. You can’t just use the Transfer APIs at will, you have to prove who you are first and that you have the authority to do the things that you want to do.
Source:
1960 most shared endpoints at 1 institution NIH, per Vas as of Bio-IT19
686 PB transferred website as of 8-16-19 (actual 686,442,707 per usage report that day)
94 billion files processed Daily Usage Report 8-9-19 (actual 93,914)
1866 active server endpts Globus Metrics Yearly report 8-5-19
115 subscribers per Greg 8-9-19
112,105 users Daily Usage Report 8-9-19
80 countries where we are used Greg data Aug 13 2019 – see my online stats sheet: https://docs.google.com/spreadsheets/d/1G3_ABgpDveicRuLWTmygZsuVcsTuYtRm2AhGdASP51I/edit#gid=1198715507
23,818 active personal endpoints (average in any given year) Globus Metrics Yearly report 8-5-19
745 identity providers per Mattias 8/12/19
2.9 PB largest single transfer Katrin Heitmann June 2019
6764 active shared endpoints Globus Metrics Yearly report 8-5-19
99.9% availability correct per Stu data 8-9-19
Where did our efforts stem from -HIPAA
But HIPAA is not a “thing” or a set of standards. These are the standards an conformities we set to adhere to in our design of the product.
So what do you get.
Sounds simple right? – From a subscriber / user perspective that is by design.
“How hard can it be”