We provide a summary review of Globus features targeted at those new to Globus. We demonstrate how to transfer and share data, and install a Globus Connect Personal endpoint on your laptop.
Opening keynote at GlobusWorld 2022. Includes examples of projects where the Globus service has had substantial impact, and a summary of the latest product features and roadmap
XSEDE is a major research infrastructure with collaborations worldwide supporting thousands of researchers across a wide range of domains. XSEDE has taken an integrative and holistic approach to supporting researchers in the use of the varying resources and services available via XSEDE. This presentation will briefly review XSEDE and its vision and provide a discussion of the efforts within XSEDE targeted at supporting research communities.
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing.
In this talk you will learn more about:
1. A quick introduction to Kafka Core, Kafka Connect and Kafka Streams: What is and why?
2. Code and step-by-step instructions to build an end-to-end streaming data application using Apache Kafka
Nintex presentation Building forms and Workflows Netwoven Inc.
In this session, we will demonstrate how Nintex can streamline your time-consuming manual processes by creating desktop and mobile forms, and automated workflows in every department (HR, Operations, Finance, Sales, Marketing, IT) of your organization. Nintex is an intuitive, no-code, browser-based, drag-and-drop workflow/form environment that reduces the complexity and time involved in building and improving business processes. We will also look at how other companies are benefiting from Nintex by automating their business processes
Healthcare Claim Reimbursement using Apache SparkDatabricks
Optum Inc helps hospitals accurately calculate the claim reimbursement, detect underpayment from the Insurance company. Optum receives millions of claims per day which needs to be evaluated in less than 8 hours and the results need to be sent back to the hospitals for revenue recovery purposes.
Opening keynote at GlobusWorld 2022. Includes examples of projects where the Globus service has had substantial impact, and a summary of the latest product features and roadmap
XSEDE is a major research infrastructure with collaborations worldwide supporting thousands of researchers across a wide range of domains. XSEDE has taken an integrative and holistic approach to supporting researchers in the use of the varying resources and services available via XSEDE. This presentation will briefly review XSEDE and its vision and provide a discussion of the efforts within XSEDE targeted at supporting research communities.
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing.
In this talk you will learn more about:
1. A quick introduction to Kafka Core, Kafka Connect and Kafka Streams: What is and why?
2. Code and step-by-step instructions to build an end-to-end streaming data application using Apache Kafka
Nintex presentation Building forms and Workflows Netwoven Inc.
In this session, we will demonstrate how Nintex can streamline your time-consuming manual processes by creating desktop and mobile forms, and automated workflows in every department (HR, Operations, Finance, Sales, Marketing, IT) of your organization. Nintex is an intuitive, no-code, browser-based, drag-and-drop workflow/form environment that reduces the complexity and time involved in building and improving business processes. We will also look at how other companies are benefiting from Nintex by automating their business processes
Healthcare Claim Reimbursement using Apache SparkDatabricks
Optum Inc helps hospitals accurately calculate the claim reimbursement, detect underpayment from the Insurance company. Optum receives millions of claims per day which needs to be evaluated in less than 8 hours and the results need to be sent back to the hospitals for revenue recovery purposes.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
I promise that understand NoSQL is as easy as playing with LEGO bricks ! The Google Bigtable presented in 2006 is the inspiration for Apache HBase: let's take a deep dive into Bigtable to better understand Hbase.
Cisco’s E-Commerce Transformation Using Kafka confluent
(Gaurav Goyal + Dharmesh Panchmatia, Cisco Systems) Kafka Summit SF 2018
Cisco e-commerce platform is a custom-built mission-critical platform which accounts for $40+ billion of Cisco’s revenue annually. It’s a suite of 35 different applications and 300+ services that powers product configuration, pricing, quoting and order booking across all Cisco product lines including hardware, software, services and subscriptions. It’s a B2B platform used by the Cisco sales team, partners and direct customers, serving 140,000 unique users across the globe. In order to improve customer experience and business agility, Cisco decided to transition the platform to cloud-native technologies, MongoDB, Elasticsearch and Kafka.
In this session, we will share details around:
-Kafka architecture
-How we are experiencing significant resiliency advantages, zero-downtime deployment and improved performance
-How we’ve implemented Kafka to pass data to 20+ downstream applications, removing point-to-point integrations, batch jobs and standardizing the handshake
-How are we using Kafka for pushing data for machine learning and analytics use cases
-Best practices and lessons learned
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
Ensuring data storage security in cloud computingUday Wankar
Cloud computing has been envisioned as the next-generation architecture of IT enterprise.
In contrast to traditional solutions, where the IT services are under proper physical, logical and personnel controls, cloud computing moves the application software and databases to the large data centers, where the management of the data and services may not be fully trustworthy.
Moving data into the cloud offers great convenience to users since they don’t have to care about the complexities of direct hardware management.
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetesconfluent
Speakers: Joe Beda, Co-founder and CTO, Heptio + Gwen Shapira, Principal Data Architect, Confluent
With the rapid adoption of microservices, there is a growing need for solutions to manage deployment, resources and data for fleets of microservices. Kubernetes is a resource management framework for containers that is rapidly growing in popularity. Apache Kafka is a streaming platform that makes data accessible to the edges of an organization. It's no wonder the question of running Kafka on Kubernetes keeps coming up!
In this online talk, Joe Beda, CTO of Heptio and co-creator of Kubernetes, and Gwen Shapira, principal data architect at Confluent and Kafka PMC member, will help you navigate through the hype, address frequently asked questions and deliver critical information to help you decide if running Kafka on Kubernetes is the right approach for your organization.
You will:
-Get an introduction to the basic concepts you need to know as you plan to deploy services on Kubernetes.
-Learn which parts of the Kafka ecosystem fit Kubernetes like a glove, and which require special attention.
-Pick up useful tips for getting started.
-See why Confluent Platform for Kubernetes is the simplest solution to deploying and orchestrating Kafka on Kubernetes, using container images and a Kubernetes operator.
Watch the recording: https://videos.confluent.io/watch/yoZcuazDjDDTcj1sRnaD3J?.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Bollini, Andrea, Ballarini, Emanuele, Buso, Irene, Boychuk, Mykhaylo, Cortese, Claudio, Digilio, Giuseppe, Fazio, Riccardo, Fiorenza, Damiano, Giamminonni, Luca, Lombardi, Corrado, Maffei, Stefano, Negretti, Davide, Orlandi, Sara, Pascarelli, Luigi Andrea, Perelli, Matteo, Scancarello, Immacolata, Scognamiglio, Francesco Pio, & Mornati, Susanna. (2022, June 8). DSpace-CRIS, anticipating innovation. Open Repositories 2022 (OR2022), Denver, Colorado. Zenodo. https://doi.org/10.5281/zenodo.6733234
DSpace-CRIS is the first open source CRIS/RIMS platform in the world. In 2022 the project will reach is 10th anniversary since the first open-source release of the version 1.8.2 alfa took place in November 2012.
Technically it is a fork of the DSpace platform, but the two communities have always walked together with the aim of bringing all the general purposes features of DSpace-CRIS to the main community. With version 7 and, especially, with the introduction of configurable entities in DSpace, the gap between these two "cousin" projects has been drastically reduced. However, thanks to the DSpace-CRIS community's increased experience in dealing with very complex use cases that have only recently found their way into “simple” DSpace, there are still many areas where DSpace-CRIS provides more advanced and still unique functionalities.
The presentation will summarize unique features and characteristics of DSpace-CRIS over DSpace in 7 minutes.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
We provide a summary review of Globus features targeted at those new to Globus. We demonstrate how to transfer and share data, and install a Globus Connect Personal endpoint on your laptop.
Presented at a workshop at Oak Ridge National Laboratory on June 22, 2022.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
I promise that understand NoSQL is as easy as playing with LEGO bricks ! The Google Bigtable presented in 2006 is the inspiration for Apache HBase: let's take a deep dive into Bigtable to better understand Hbase.
Cisco’s E-Commerce Transformation Using Kafka confluent
(Gaurav Goyal + Dharmesh Panchmatia, Cisco Systems) Kafka Summit SF 2018
Cisco e-commerce platform is a custom-built mission-critical platform which accounts for $40+ billion of Cisco’s revenue annually. It’s a suite of 35 different applications and 300+ services that powers product configuration, pricing, quoting and order booking across all Cisco product lines including hardware, software, services and subscriptions. It’s a B2B platform used by the Cisco sales team, partners and direct customers, serving 140,000 unique users across the globe. In order to improve customer experience and business agility, Cisco decided to transition the platform to cloud-native technologies, MongoDB, Elasticsearch and Kafka.
In this session, we will share details around:
-Kafka architecture
-How we are experiencing significant resiliency advantages, zero-downtime deployment and improved performance
-How we’ve implemented Kafka to pass data to 20+ downstream applications, removing point-to-point integrations, batch jobs and standardizing the handshake
-How are we using Kafka for pushing data for machine learning and analytics use cases
-Best practices and lessons learned
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
Ensuring data storage security in cloud computingUday Wankar
Cloud computing has been envisioned as the next-generation architecture of IT enterprise.
In contrast to traditional solutions, where the IT services are under proper physical, logical and personnel controls, cloud computing moves the application software and databases to the large data centers, where the management of the data and services may not be fully trustworthy.
Moving data into the cloud offers great convenience to users since they don’t have to care about the complexities of direct hardware management.
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetesconfluent
Speakers: Joe Beda, Co-founder and CTO, Heptio + Gwen Shapira, Principal Data Architect, Confluent
With the rapid adoption of microservices, there is a growing need for solutions to manage deployment, resources and data for fleets of microservices. Kubernetes is a resource management framework for containers that is rapidly growing in popularity. Apache Kafka is a streaming platform that makes data accessible to the edges of an organization. It's no wonder the question of running Kafka on Kubernetes keeps coming up!
In this online talk, Joe Beda, CTO of Heptio and co-creator of Kubernetes, and Gwen Shapira, principal data architect at Confluent and Kafka PMC member, will help you navigate through the hype, address frequently asked questions and deliver critical information to help you decide if running Kafka on Kubernetes is the right approach for your organization.
You will:
-Get an introduction to the basic concepts you need to know as you plan to deploy services on Kubernetes.
-Learn which parts of the Kafka ecosystem fit Kubernetes like a glove, and which require special attention.
-Pick up useful tips for getting started.
-See why Confluent Platform for Kubernetes is the simplest solution to deploying and orchestrating Kafka on Kubernetes, using container images and a Kubernetes operator.
Watch the recording: https://videos.confluent.io/watch/yoZcuazDjDDTcj1sRnaD3J?.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Bollini, Andrea, Ballarini, Emanuele, Buso, Irene, Boychuk, Mykhaylo, Cortese, Claudio, Digilio, Giuseppe, Fazio, Riccardo, Fiorenza, Damiano, Giamminonni, Luca, Lombardi, Corrado, Maffei, Stefano, Negretti, Davide, Orlandi, Sara, Pascarelli, Luigi Andrea, Perelli, Matteo, Scancarello, Immacolata, Scognamiglio, Francesco Pio, & Mornati, Susanna. (2022, June 8). DSpace-CRIS, anticipating innovation. Open Repositories 2022 (OR2022), Denver, Colorado. Zenodo. https://doi.org/10.5281/zenodo.6733234
DSpace-CRIS is the first open source CRIS/RIMS platform in the world. In 2022 the project will reach is 10th anniversary since the first open-source release of the version 1.8.2 alfa took place in November 2012.
Technically it is a fork of the DSpace platform, but the two communities have always walked together with the aim of bringing all the general purposes features of DSpace-CRIS to the main community. With version 7 and, especially, with the introduction of configurable entities in DSpace, the gap between these two "cousin" projects has been drastically reduced. However, thanks to the DSpace-CRIS community's increased experience in dealing with very complex use cases that have only recently found their way into “simple” DSpace, there are still many areas where DSpace-CRIS provides more advanced and still unique functionalities.
The presentation will summarize unique features and characteristics of DSpace-CRIS over DSpace in 7 minutes.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
We provide a summary review of Globus features targeted at those new to Globus. We demonstrate how to transfer and share data, and install a Globus Connect Personal endpoint on your laptop.
Presented at a workshop at Oak Ridge National Laboratory on June 22, 2022.
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
An introduction to the core features of the Globus data management service. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Greg Nawrocki.
Scalable Data Management: Automation and the Modern Research Data PortalGlobus
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its interactive web browser interface addresses simple file transfer and sharing scenarios, large scale automation typically requires integration of the research data management platform it provides into bespoke applications.
We will describe one such example, the Petrel data portal (https://petreldata.net), used by researchers to manage data in diverse fields including materials science, cosmology, machine learning, and serial crystallography. The portal facilitates automated ingest of data, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users. As security and privacy are often critical requirements, the portal employs fine-grained permissions that control both visibility of metadata and access to the datasets themselves. It is based on the Modern Research Data Portal design pattern, jointly developed by the ESnet and Globus teams, and leverages capabilities such as the Science DMZ for enhanced performance and to streamline the user experience.
Simplified Research Data Management with the Globus PlatformGlobus
Overview of the Globus research data management platform, as presented at the Fall 2018 Membership Meeting of the Coalition for Networked Information (CNI), held in Washington, D.C., December 10-11, 2018
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
We provide an overview of the Globus platform features, and demonstrate several data management features. Serving as an introductory session suitable for new users, we use the Globus web app to show data transfer and sharing, use of Globus Connect Personal for laptop/desktop access and introduce the Globus Command Line Interface for interactive and scripting.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
Introduction to Data Transfer and Sharing for ResearchersGlobus
We will provide a summary review of Globus features targeted at those new to Globus. We will present various use cases that illustrate the power of Globus data sharing capabilities.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Similar to Introduction to Globus for New Users (20)
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
Innovating Inference at Exascale - Remote Triggering of Large Language Models...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
8. 9
Globus delivers…
Fast and reliable big data transfer,
sharing, and platform services…
…directly from your own storage
systems…
...via software-as-a-service using
existing identities with the overarching
goal of...
11. Globus SaaS / PaaS: Research data lifecycle
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Globus
controls
access to
shared files on
existing
storage; no
need to move
files to cloud
storage!
4
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
3
…and automating
research workflows -
ensuring those that
need access to the
data have it.
8
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Personal Computer
Transfer
Share
• Use a Web browser or
platform services
• Access any storage
• Use an existing identity
Compute Facility
Globus transfers files
reliably, securely
2
Build
The Timer and
Automation Services, the
Command Line
Interface, API sets, and
Python SDK provide the
tools…
6
… for building
science gateways,
portals ,publication
services…
7
14. Globus core security features
• Access Control
– Identities provided and managed by institution
– Institution controls all access policies
– Globus is identity broker; no access to/storage of user credentials
– Fine grained access control on the collections
• Data remain at institutions, not stored by Globus
• Data does not flow through the Globus Service but directly between
Endpoints and their Collections
• Integrity checks of transferred data
• High availability and redundancy
• Encryption of user files and Globus control data
15. Collections and Globus Connect
• Globus Connect Server
– Multi user Linux Systems
– https://docs.globus.org/globus-connect-server/
• Globus Connect Personal
– Personal Workstations and Laptops
– https://www.globus.org/globus-connect-personal
– OS specific instructions
o https://docs.globus.org/how-to/
16. Demo time!
Identities and
Accounts Transfer
Sharing
Transfer Details
Bookmarks
The Console
The Hamburger
Menu
The Activity Monitor
Groups
Roles
Responsive
Interface
17. Manage Protected Data
20
Higher assurance levels for HIPAA and other regulated data
• Support for protected data
such as health related
information
• Share data with collaborators
while meeting compliance
requirements
• Includes BAA option
18. Globus for high assurance data management
• Restricted data handling
– PII (Personally identifiable information)
– CUI (Controlled Unclassified Information)
– PHI (Protected Health Information)
• University of Chicago security controls
– NIST SP 800-53
– Superset of NIST SP 800-171
• Business Associate Agreements (BAA) will be between
University of Chicago and our subscribers
– University of Chicago has a BAA with Amazon
19. High Assurance features
• Additional authentication assurance
– Per storage gateway policy on frequency of authentication with
specific identity for access to data (timeout)
– Ensure that user authenticates with the specific identity that
gives them access within session (decoupling linked identities)
• Session/device isolation
– Authentication context is per application, per session (~browser
session)
• Enforces encryption of all user data in transit
• Audit logging
20. One service, many interfaces
23
GET /endpoint/go%23ep1
PUT /endpoint/vas#my_endpt
200 OK
X-Transfer-API-Version: 0.10
Content-Type: application/json
…
Globus service
Web
CLI
Rest
API
21. Globus Automation Capabilities
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
22. 26
Custom portals? Science Gateways? Unique workflows? Our open
REST APIs and Python SDK empower you to create an integrated
ecosystem of research data services and applications.
23. Globus APIs
• Auth
• Groups
• Transfer
• Search
• Timer
• Flows
• GCS Manager
• Globus Web App consumes public
Transfer API
• Resource named by URL (standard
REST approach)
• Globus APIs use JSON for documents
docs.globus.org/api/transfer
24. Globus Python SDK
• Python client library for the Globus REST APIs
• Largely direct mapping to REST API
• globus_sdk.TransferClient class handles
connection management, security, framing,
marshaling
globus-sdk-python.readthedocs.io/en/stable/
globus.github.io/globus-sdk-python
28
26. Developer References
• Globus API / SDK Documentation
– Transfer API : docs.globus.org/api/transfer/
– SDK: globus-sdk-python.readthedocs.io/en/stable/
• Globus GitHub: github.com/globus/
– Jupyter Notebooks
o Stand alone notebooks and hub integrations that walk through much of the
functionality of our SDK
o https://github.com/globus/globus-jupyter-notebooks
– Automation Examples
o Shell scripted CLI and Python module examples of common research data
management use cases
o https://github.com/globus/automation-examples