Simple Data Automation with Globus (GlobusWorld Tour West)Globus
Greg Nawrocki's document discusses how Globus provides tools for data automation programming including a command line interface, timer service, REST APIs, and Python SDK. These tools allow users to create integrated ecosystems of research data services and applications while managing security and authentication through Globus Auth. Specific examples are given for using the command line interface, timer service, REST APIs, and Python SDK to automate tasks like file transfers, scheduled jobs, and accessing endpoints. Resources for learning more and code examples are also provided.
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
An overview of installing and configuring a Globus endpoints on your storage system. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobus
An introduction to the Globus command line interface and the SDK for accessing Globus platform services. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
1) The Globus platform provides services for fast and reliable data transfer, sharing, and file management directly from storage systems via software-as-a-service using existing identities.
2) Globus can be used as a platform for building science gateways, portals and other web applications in support of research through APIs for user authentication, file transfer, and sharing capabilities.
3) The document provides an introduction to the Globus platform and its capabilities including code samples and walks through using the APIs via a Jupyter notebook to search for endpoints, manage files and tasks, and integrate Globus into other applications.
This document provides an overview of Globus Data Portals and the Django Globus Portal Framework. Globus Data Portals allow users to securely discover, access, and act on research data through a customizable web interface. The Django Globus Portal Framework enables researchers to quickly build portals by leveraging Globus services like Search, Transfer, and Auth for features like authenticated access, data discovery, and integration with custom applications. It discusses key portal capabilities, architecture, and provides a link to demo building a new portal using the framework and example repo.
Introduction to Globus (GlobusWorld Tour West)Globus
This document introduces Globus, which provides fast and reliable data transfer, sharing, and platform services across different storage systems and resources. It does this through software-as-a-service that uses existing user identities, with the goal of unifying access to data across different tiers like HPC, storage, cloud, and personal resources. Key features include secure data transfers without moving files, access control and sharing capabilities, and tools for building automations and integrating with science gateways. It also discusses options for handling protected data like health information with additional security controls and business agreements.
Globus Command Line Interface (APS Workshop)Globus
The document provides information about using the Globus Command Line Interface (CLI) to automate data transfers and sharing. It discusses installing the CLI and some basic commands like searching for endpoints, listing files, and doing transfers. It also covers more advanced topics like managing permissions, batch transfers, notifications, and examples of automation scripts that use the CLI to move data between endpoints and share it with other users based on permissions. The final section walks through an example of using a shell script to automate the process of moving data from an instrument to a shared guest collection and setting permissions for another user to access it.
Globus is a non-profit data management service that allows users to transfer, share, and access data across different storage systems and platforms through software-as-a-service. It has transferred over 1.34 exabytes of data and aims to unify access to research data across different tiers of storage through connectors, APIs, and user interfaces. Globus ensures secure data transfers and sharing by using user identities, access controls, encryption, and audit logging without storing user credentials or data.
Simple Data Automation with Globus (GlobusWorld Tour West)Globus
Greg Nawrocki's document discusses how Globus provides tools for data automation programming including a command line interface, timer service, REST APIs, and Python SDK. These tools allow users to create integrated ecosystems of research data services and applications while managing security and authentication through Globus Auth. Specific examples are given for using the command line interface, timer service, REST APIs, and Python SDK to automate tasks like file transfers, scheduled jobs, and accessing endpoints. Resources for learning more and code examples are also provided.
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
An overview of installing and configuring a Globus endpoints on your storage system. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobus
An introduction to the Globus command line interface and the SDK for accessing Globus platform services. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
1) The Globus platform provides services for fast and reliable data transfer, sharing, and file management directly from storage systems via software-as-a-service using existing identities.
2) Globus can be used as a platform for building science gateways, portals and other web applications in support of research through APIs for user authentication, file transfer, and sharing capabilities.
3) The document provides an introduction to the Globus platform and its capabilities including code samples and walks through using the APIs via a Jupyter notebook to search for endpoints, manage files and tasks, and integrate Globus into other applications.
This document provides an overview of Globus Data Portals and the Django Globus Portal Framework. Globus Data Portals allow users to securely discover, access, and act on research data through a customizable web interface. The Django Globus Portal Framework enables researchers to quickly build portals by leveraging Globus services like Search, Transfer, and Auth for features like authenticated access, data discovery, and integration with custom applications. It discusses key portal capabilities, architecture, and provides a link to demo building a new portal using the framework and example repo.
Introduction to Globus (GlobusWorld Tour West)Globus
This document introduces Globus, which provides fast and reliable data transfer, sharing, and platform services across different storage systems and resources. It does this through software-as-a-service that uses existing user identities, with the goal of unifying access to data across different tiers like HPC, storage, cloud, and personal resources. Key features include secure data transfers without moving files, access control and sharing capabilities, and tools for building automations and integrating with science gateways. It also discusses options for handling protected data like health information with additional security controls and business agreements.
Globus Command Line Interface (APS Workshop)Globus
The document provides information about using the Globus Command Line Interface (CLI) to automate data transfers and sharing. It discusses installing the CLI and some basic commands like searching for endpoints, listing files, and doing transfers. It also covers more advanced topics like managing permissions, batch transfers, notifications, and examples of automation scripts that use the CLI to move data between endpoints and share it with other users based on permissions. The final section walks through an example of using a shell script to automate the process of moving data from an instrument to a shared guest collection and setting permissions for another user to access it.
Globus is a non-profit data management service that allows users to transfer, share, and access data across different storage systems and platforms through software-as-a-service. It has transferred over 1.34 exabytes of data and aims to unify access to research data across different tiers of storage through connectors, APIs, and user interfaces. Globus ensures secure data transfers and sharing by using user identities, access controls, encryption, and audit logging without storing user credentials or data.
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Globus
This document discusses how to leverage the Globus platform in web applications. It describes the Globus platform as providing secure and reliable data orchestration and transfer capabilities. It outlines several key Globus services like Auth, Transfer, and Helper Pages and provides examples of how they can be used to build applications. It also summarizes the Globus APIs and provides code examples of accessing endpoints, submitting tasks, and more.
This document provides information about installing and configuring Globus Connect Server (GCS) version 4. It discusses how GCS makes local storage accessible via Globus, installing GCS on an Amazon EC2 instance, common configuration options like restricting file paths and enabling sharing, and using subscriptions to create managed endpoints.
Introduction to the Globus Platform (APS Workshop)Globus
This document discusses the Globus Platform Services API and SDK. It provides an overview of the Globus Auth API for user authentication and file sharing capabilities. It also summarizes the Globus Transfer API and Python SDK for integrating file transfer and access management into applications. Several methods for tasks like endpoint search, file operations, task submission and management are covered at a high level.
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
An introduction to the core features of the Globus data management service. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Connecting Your System to Globus (APS Workshop)Globus
Connecting a storage system to Globus involves:
1. Registering a Globus Connect Server to get credentials.
2. Installing Globus Connect Server packages and setting up an endpoint and data transfer node.
3. Creating a POSIX storage gateway to make the storage accessible and define authentication policies.
4. Adding a mapped collection to provide access to files in the storage system via Globus.
5. Associating the endpoint with a Globus subscription to enable additional features.
Instrument Data Orchestration with Globus Search and FlowsGlobus
This document discusses various Globus services for instrument data orchestration including the Timer service, platform services, authentication, search, transfer, flows, and the upcoming Trigger service. The Timer service allows for scheduled and recurring transfers. Platform services provide comprehensive data and compute orchestration. Authentication is handled by Globus Auth. Search allows for data description and discovery. Transfer shares and moves data. Flows automate distributed research tasks. Triggers will start flows based on events.
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)Globus
1. Globus Connect Server software makes storage systems accessible via Globus by installing software on the storage system that connects it to the Globus network.
2. To set up access, you first register a Globus Connect Server, install the software, set up an endpoint, and create a storage gateway and mapped collection.
3. You can then associate the endpoint with a Globus subscription to manage access and share data by creating guest collections.
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobus
Using Globus platform services like Search and Flows to build data portals, science gateways and data commons that facilitate data discovery and collaboration. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
An overview of developments in the Globus platform during 2020-2021, presented at a webinar hosted by Internet2. Includes an overview of Globus Connect Server v5, cloud storage connectors, and platform services for developers (e.g., Globus Search and Globus Flows).
Introduction to the Globus PaaS (GlobusWorld Tour - STFC)Globus
Globus serves as a platform for building science gateways, web portals, and other applications in support of research and education. It provides identity and access management through Globus Auth as well as APIs for file transfer, search, and sharing. Developers can access these services through the Globus Python SDK or by using helper pages designed for web applications. Example applications include a modern research data portal that leverages Globus for authentication and file operations. Support resources include documentation, a helpdesk, professional services, and sample code.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Data Orchestration at Scale (GlobusWorld Tour West)Globus
This document discusses how the Globus platform can be used for instrument data orchestration at scale. It describes how Globus Auth provides foundational identity and access management, Globus Transfer enables data transfer and sharing, Globus Search allows for data description and discovery, and Globus Flows can automate multi-step data and computing workflows. Specific capabilities and use cases are outlined, such as federated cancer registry queries across multiple institutions using Globus services while maintaining local data control.
Jupyter + Globus: The Foundation for Interactive Data ScienceGlobus
This tutorial from the Gateways 2018 conference in Austin, TX showed participants how Globus may be used in conjunction with the Jupyter platform to open up new avenues—and new data sources--for interactive data science.
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Globus
The document discusses how the Globus platform can be leveraged to build science gateways, web portals, and other applications. It provides examples of how the Globus Auth, APIs, and Connect services can be used to enable authentication, file transfer, and data sharing. The Globus Python SDK and helper pages are also described as tools for developing applications that integrate Globus functionality.
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Globus
This document discusses how to leverage the Globus platform in web applications. It describes the Globus platform as providing secure and reliable data orchestration and transfer capabilities. It outlines several key Globus services like Auth, Transfer, and Helper Pages and provides examples of how they can be used to build applications. It also summarizes the Globus APIs and provides code examples of accessing endpoints, submitting tasks, and more.
This document provides information about installing and configuring Globus Connect Server (GCS) version 4. It discusses how GCS makes local storage accessible via Globus, installing GCS on an Amazon EC2 instance, common configuration options like restricting file paths and enabling sharing, and using subscriptions to create managed endpoints.
Introduction to the Globus Platform (APS Workshop)Globus
This document discusses the Globus Platform Services API and SDK. It provides an overview of the Globus Auth API for user authentication and file sharing capabilities. It also summarizes the Globus Transfer API and Python SDK for integrating file transfer and access management into applications. Several methods for tasks like endpoint search, file operations, task submission and management are covered at a high level.
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
An introduction to the core features of the Globus data management service. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Connecting Your System to Globus (APS Workshop)Globus
Connecting a storage system to Globus involves:
1. Registering a Globus Connect Server to get credentials.
2. Installing Globus Connect Server packages and setting up an endpoint and data transfer node.
3. Creating a POSIX storage gateway to make the storage accessible and define authentication policies.
4. Adding a mapped collection to provide access to files in the storage system via Globus.
5. Associating the endpoint with a Globus subscription to enable additional features.
Instrument Data Orchestration with Globus Search and FlowsGlobus
This document discusses various Globus services for instrument data orchestration including the Timer service, platform services, authentication, search, transfer, flows, and the upcoming Trigger service. The Timer service allows for scheduled and recurring transfers. Platform services provide comprehensive data and compute orchestration. Authentication is handled by Globus Auth. Search allows for data description and discovery. Transfer shares and moves data. Flows automate distributed research tasks. Triggers will start flows based on events.
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)Globus
1. Globus Connect Server software makes storage systems accessible via Globus by installing software on the storage system that connects it to the Globus network.
2. To set up access, you first register a Globus Connect Server, install the software, set up an endpoint, and create a storage gateway and mapped collection.
3. You can then associate the endpoint with a Globus subscription to manage access and share data by creating guest collections.
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobus
Using Globus platform services like Search and Flows to build data portals, science gateways and data commons that facilitate data discovery and collaboration. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
An overview of developments in the Globus platform during 2020-2021, presented at a webinar hosted by Internet2. Includes an overview of Globus Connect Server v5, cloud storage connectors, and platform services for developers (e.g., Globus Search and Globus Flows).
Introduction to the Globus PaaS (GlobusWorld Tour - STFC)Globus
Globus serves as a platform for building science gateways, web portals, and other applications in support of research and education. It provides identity and access management through Globus Auth as well as APIs for file transfer, search, and sharing. Developers can access these services through the Globus Python SDK or by using helper pages designed for web applications. Example applications include a modern research data portal that leverages Globus for authentication and file operations. Support resources include documentation, a helpdesk, professional services, and sample code.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Data Orchestration at Scale (GlobusWorld Tour West)Globus
This document discusses how the Globus platform can be used for instrument data orchestration at scale. It describes how Globus Auth provides foundational identity and access management, Globus Transfer enables data transfer and sharing, Globus Search allows for data description and discovery, and Globus Flows can automate multi-step data and computing workflows. Specific capabilities and use cases are outlined, such as federated cancer registry queries across multiple institutions using Globus services while maintaining local data control.
Jupyter + Globus: The Foundation for Interactive Data ScienceGlobus
This tutorial from the Gateways 2018 conference in Austin, TX showed participants how Globus may be used in conjunction with the Jupyter platform to open up new avenues—and new data sources--for interactive data science.
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Globus
The document discusses how the Globus platform can be leveraged to build science gateways, web portals, and other applications. It provides examples of how the Globus Auth, APIs, and Connect services can be used to enable authentication, file transfer, and data sharing. The Globus Python SDK and helper pages are also described as tools for developing applications that integrate Globus functionality.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Gateways 2020 Tutorial - Introduction to GlobusGlobus
Globus provides a platform and services for simplifying data management and sharing for science gateways and applications. It offers fast and reliable file transfers between any storage systems, secure data sharing without copying data, and APIs and SDKs for building applications. Globus uses OAuth authentication and supports a variety of interfaces like CLI, Python SDK, and Jupyter notebooks to enable access.
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
This document introduces the Globus platform for automating research data flows. It describes Globus capabilities for scheduled transfers, command line scripting, and comprehensive task orchestration. It also covers Globus Auth for identity and access management, securing apps with Globus Auth, and the Globus Timer Service. The Globus command line interface and Python SDK allow for programmatic access and automation of data transfers and other tasks.
Introduction to Globus: Research Data Management Software at the ALCFGlobus
This document provides an introduction and overview of Globus, a research data management platform. It discusses how Globus can be used to move, share, discover, and reproduce data across different storage tiers and resources. Globus delivers fast and reliable big data transfer, sharing, and platform services directly from existing storage systems via software-as-a-service using existing identities, with the goal of unifying access to data across different locations and resources. The document demonstrates how Globus can be used via its web interface, command line interface, REST API, and as a platform for building other research applications and workflows.
Automating Research Data Flows and an Introduction to the Globus PlatformGlobus
We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service. We use a Jupyter notebook to demonstrate automation of file transfers and permissions management on shared datasets. We also provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on understanding the security model; and will demonstrate how to access Globus services via APIs for integration with custom research applications.
Presented at a workshop at Oak Ridge National Laboratory on June 23, 2022.
Building Data Portals and Science Gateways with GlobusGlobus
Presented at GlobusWorld 2022 by the Globus professional services team. Describes the Modern Research Data Portal design pattern and an implementation using the Django framework.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
Working with Globus Platform Services and PortalsGlobus
We describe how developers can use Globus APIs to integrate robust data management capabilities into their research applications. We also demonstrate the new Globus portal framework that can be used in conjunction with the Globus Search service to simplify data search and discovery.
This document provides an overview of Globus platform services, including APIs, SDKs, authentication, and search capabilities. It discusses the Globus Auth service for identity and access management, APIs for transfer, search, and other resources. It also describes using the Python SDK, configuring app access, and examples of ingesting and discovering data using Globus Search. Portal frameworks and exemplars for building applications are also mentioned.
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
We describe the large-scale data transfer scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform large-scale data transfers, and walk through a code repository with the web application’s code.
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)Globus
This document discusses various ways to automate research data workflows using Globus. It describes automating regular data transfers through recurring tasks scheduled with sync options. It also discusses staging data automatically as part of compute jobs by adding directives to job scripts. The document outlines how applications can programmatically submit transfers when users complete tasks. It provides an overview of relevant Globus platform capabilities for authentication, authorization, and automation using the Globus CLI and SDK.
Building Research Applications with Globus PaaSGlobus
We provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on building simple web applications for data distribution and discovery. We describe how to register an application with Globus and access platform APIs using the Globus Python SDK and a Jupyter Notebook. We also introduce the Globus Search service and demonstrate how it is used by an open source web portal framework that can jumpstart research application development.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Globus
This document discusses how Globus services can facilitate collaboration and data sharing through automated workflows. It describes how Globus Auth enables authentication for shared access to endpoints. APIs and command line tools allow applications to programmatically manage permissions and transfer data. JupyterHub can be configured with Globus Auth to provide tokens for accessing remote Globus services within notebooks. This enables collaborative and distributed data analysis. The document also outlines how Globus services can support automated publication of datasets through search, identifiers, and metadata.
This document provides an overview of the Globus platform and APIs for developers. It describes Globus Auth for identity and access management, the Globus Transfer API for file sharing and transfer, and the Globus Python SDK. It covers authentication flows, scopes, and how to programmatically activate endpoints and get user consent for data access using the Transfer API and SDK. The document recommends additional Globus documentation, code samples, and other support resources for developers to integrate Globus capabilities into their applications.
Tutorial: Automating Research Data WorkflowsGlobus
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
The document discusses automating research data workflows using the Globus Command Line Interface (CLI). Key points include:
1) The CLI can be used to automate recurring transfers, stage data in/out of compute jobs, and allow applications to submit transfers on a user's behalf by using access tokens.
2) Refresh tokens allow applications to get new access tokens without a user logged in, by storing and exchanging refresh tokens.
3) Examples of automation include syncing directories with scripts, staging data in shared directories, and removing directories after transfer with Python scripts.
4) Resources for support include Globus documentation, sample code, and professional services for custom application development and integration.
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
Similar to Tutorial: Leveraging Globus in your Research Applications (20)
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
3. 3
Globus delivers… with applications and as a
platform…
Fast and reliable data transfer, sharing, and file
management…
…directly from your own storage systems…
...via software-as-a-service using existing
identities.
4. How can I integrate
Globus into my research
workflows?
4
5. Globus serves as…
A platform for building science
gateways, portals and other
web applications in support of
research and education.
5
6. Globus Auth API
(Group Management)
…
GlobusTransferAPI
GlobusConnect
File Sharing
File Transfer & Replication
Globus Platform-as-a-Service
Use existing institutional
ID systems in external
web applications
Integrate file transfer and sharing
capabilities into scientific web
apps, portals, gateways, etc...
8. PaaS Security Challenges – Globus Auth
• How to provide:
– Login to apps
o Web apps (Jupyter Notebook, Portals), Mobile, Desktop, Command line
– Protect all REST API communications
o App à Globus service (Jupyter Notebook, MRDP)
o App à non-Globus service (MRDP)
o Service à service (MRDP)
• While:
– Not introducing even more identities
o Providing a platform to consolidate those identities
– Providing least privileges security model (consents)
– Being agnostic to programming language and framework
– Being web friendly
– Making it easy for users and developers
8
9. Client
(Web Portal,
Application)
Globus Transfer
(Resource Server)
Globus Auth
(Authorization
Server)
5. Authenticate using client id
and secret, send authorization
code
Authorization Code Grant
Browser (User)
1. Access
portal
2.
Redirects
user
3. User authenticates and
consents
4. Authorization
code
6. Access token(s)
7. Authenticate with access
token(s) to give the client
the authority invoke the
transfer service
Identity
Provider
10. Globus Transfer API
• Globus Web App consumes public Transfer API
• Resource named by URL (standard REST approach)
– Query params allow refinement (e.g., subset of fields)
• Globus APIs use JSON for documents and resource
representations
• Requests authorized via OAuth2 access token
– Authorization: Bearer asdflkqhafsdafeawk
docs.globus.org/api/transfer
10
11. Globus Python SDK
• Python client library for the Globus Auth and Transfer
REST APIs
• globus_sdk.TransferClient class handles
connection management, security, framing,
marshaling
from globus_sdk import TransferClient
tc = TransferClient()
globus.github.io/globus-sdk-python
11
12. TransferClient low-level calls
• Thin wrapper around REST API
– post(), get(), update(), delete()
get(path, params=None, headers=None, auth=None,
response_class=None)
o path – path for the request, with or without leading slash
o params – dict to be encoded as a query string
o headers – dict of HTTP headers to add to the request
o response_class – class response object, overrides the client’s
default_response_class
o Returns: GlobusHTTPResponse object
12
13. TransferClient higher-level calls
• One method for each API resource and HTTP verb
• Largely direct mapping to REST API
endpoint_search(filter_fulltext=None,
filter_scope=None,
num_results=25,
**params)
13
14. Walkthrough API with our Jupyter Hub
• https://jupyter.demo.globus.org
– Sign in with Globus
– Verify the consents
– Start My Server (this will take about a minute)
– Open folder: globus-jupyter-notebooks
– Open folder: GlobusWorldTour
– Run Platform_Introduction_JupyterHub_Auth.ipynb
• If you mess it up and want to “go back to the beginning”
– Back down to the root folder
– Run NotebookPuller.ipynb
• If you want to use the notebook outside of our hub
– https://github.com/globus/globus-jupyter-notebooks
– Authentication is a manual cut and paste of exchanging the authorization
code for an access token
14
15. Endpoint Search
• Plain text search for endpoint
– Searches owner, display name, keywords, description,
organization, department
– Full word and prefix match
• Limit search to pre-defined scopes
– all, my-endpoints, recently-used, in-use, shared-
by-me, shared-with-me
• Returns: List of endpoint documents
15
17. Endpoint Activation
• Activating endpoint means binding a credential to an
endpoint for login
• Globus Connect Server endpoint that have MyProxy
or MyProxy OAuth identity provider require login via
web
• Auto-activate
– Globus Connect Personal and Shared endpoints use Globus-
provided credential
– Must auto-activate before any API calls to endpoints
17
18. File operations
• List directory contents (ls)
• Make directory (mkdir)
• Rename
• Note:
– Path encoding & UTF gotchas
– Don’t forget to auto-activate first
18
19. Task submission
• Asynchronous operations
– Transfer
o Sync level option
– Delete
• Get submission_id, followed by submit
– Once and only once submission
19
20. Task management
• Get task by id
• Get task_list
• Update task by id (label, deadline)
• Cancel task by id
• Get event list for task
• Get task pause info
20
21. Bookmarks
• Get list of bookmarks
• Create bookmark
• Get bookmark by id
• Update bookmark
• Delete bookmark by id
• Cannot perform other operations directly on bookmarks
– Requires client-side resolution
21
22. Shared endpoints and access rules (ACLs)
• Shared Endpoint – create / delete / get info / get list
• Administrator role required to delegate access managers
• Access manager role required to manage
permission/ACLs
• Operations:
– Get list of access rules
– Get access rule by id
– Create access rule
– Update access rule
– Delete access rule
22
23. Management API
• Allow endpoint administrators to monitor and manage
all tasks with endpoint
– Task API is essentially the same as for users
– Information limited to what they could see locally
• Cancel tasks
• Pause rules
23
24. Walkthrough API with a Jupyter Notebook
• If you want to use the / a notebook outside of our hub
– https://github.com/globus/globus-jupyter-notebooks
– Authentication is a manual cut and paste of the authorization
code.
– Native App grant
24
25. Globus Helper Pages
• Globus pages designed for use by your web apps
– Browse Endpoint
– Activate Endpoint
– Select Group
– Manage Identities
– Manage Consents
– Logout
docs.globus.org/api/helper-pages
26
29. Support resources
• Globus documentation: docs.globus.org
• Helpdesk and issue escalation: support@globus.org
• Mailing lists
– https://www.globus.org/mailing-lists
– developer-discuss@globus.org
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios