We describe the automated data ingest scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform automated data ingest and present a faceted search interface that can be used by science gateways to simplify data discovery. We also walk through the application's GitHub repository and highlight relevant components.
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
We describe the large-scale data transfer scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform large-scale data transfers, and walk through a code repository with the web application’s code.
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
We describe the requirements for, and challenges of, distributing datasets at scale, e.g. from instruments such as CryoEM and advanced light sources. We demonstrate a web application that uses Globus to perform large-scale data distribution. We introduce and walk through a Jupyter notebook highlighting the relevant code to incorporate into a science gateway.
Gateways 2020 Tutorial - Introduction to GlobusGlobus
We review the Globus web application, the Globus Connect software and Globus “endpoints,” Globus identity management, and the Globus command-line interface (CLI), Python software development kit (SDK), and RESTful APIs.
Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobus
Using Globus platform services like Search and Flows to build data portals, science gateways and data commons that facilitate data discovery and collaboration. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
We describe the large-scale data transfer scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform large-scale data transfers, and walk through a code repository with the web application’s code.
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
We describe the requirements for, and challenges of, distributing datasets at scale, e.g. from instruments such as CryoEM and advanced light sources. We demonstrate a web application that uses Globus to perform large-scale data distribution. We introduce and walk through a Jupyter notebook highlighting the relevant code to incorporate into a science gateway.
Gateways 2020 Tutorial - Introduction to GlobusGlobus
We review the Globus web application, the Globus Connect software and Globus “endpoints,” Globus identity management, and the Globus command-line interface (CLI), Python software development kit (SDK), and RESTful APIs.
Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobus
Using Globus platform services like Search and Flows to build data portals, science gateways and data commons that facilitate data discovery and collaboration. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
An overview of developments in the Globus platform during 2020-2021, presented at a webinar hosted by Internet2. Includes an overview of Globus Connect Server v5, cloud storage connectors, and platform services for developers (e.g., Globus Search and Globus Flows).
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
Major research instruments are generating orders of magnitude more data in relatively short timeframes. As a result, the research enterprise is increasingly challenged by what should be mundane tasks: describing data for discovery and making data securely accessible to the broader research community. The ad hoc methods currently employed place undue burden on scientists and system administrators alike, and it is clear that a more robust, scalable approach is required.
Bespoke data portals (and science gateways/data commons) are becoming more prominent as a means of enabling access to large datasets. in this tutorial we demonstrate how services for authentication, authorization, metadata management, and search may be integrated with popular web frameworks, and used in combination with fast, well-architected networks to make data discoverable and accessible. Outcomes: build a simple, but functional, data portal that facilitates flexible data description, faceted data search and secure data access.
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Giri Prakash from the ARM Data Center at Oak Ridge National Laboratory.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
An overview of installing and configuring a Globus endpoints on your storage system. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
Screenshots prepared by Ben Blaiszik and Kyle Chard, used in our Globus publication demo at GlobusWorld 2014. See https://www.globus.org/data-publication for more information and the notes on the slides for details.
Log Management
Log Monitoring
Log Analysis
Need for Log Analysis
Problem with Log Analysis
Some of Log Management Tool
What is ELK Stack
ELK Stack Working
Beats
Different Types of Server Logs
Example of Winlog beat, Packetbeat, Apache2 and Nginx Server log analysis
Mimikatz
Malicious File Detection using ELK
Practical Setup
Conclusion
Tutorial presented at Mini Gateways 2022. Demonstrates how to build data portals and science gateways with the Django Globus Portal Framework.
The broad scope of a typical science gateway—to simplify access to shared data, computing and other resources—makes building such a gateway from scratch a daunting task. Investigators must be able to stage data from instruments (or other sources), submit compute jobs to analyze data, move data to more persistent storage, describe data products, and provide a means for collaborators to search, discover, reuse and augment these data products. Myriad tools are available to enable all these tasks but integrating them in a way that hides the complexity from users, is a challenge.
In this tutorial we will describe an approach that bootstraps science gateway development based on the Modern Research Data Portal[1] design pattern. The solution uses a set of open source tools that build on the established Django web framework, the ubiquitous OAuth2/OpenID connect standards for authentication/authorization, the widely deployed Globus service for research data management, and the nascent funcX functions-as-a-service platform. Attendees will learn how to rapidly deploy a science gateway that enables both automated computation at scale and data enhanced discovery of resulting data products. The emphasis will be on automating many of the required tasks so that gateway developers can focus on building differentiated, discipline-specific functionality rather than low-value—yet critical—supporting infrastructure.
We will use the ALCF Community Data Co-Op as an exemplar to illustrate how these tools have been used to support large-scale collaborative research. We will describe the overall solution architecture and introduce attendees to the individual tools. Attendees will then use these tools to deploy and configure their own science gateway to support image analysis, description, indexing and search.
The tutorial will comprise a mix of lectures, demonstration and hands-on exercises. Virtual machines will be provided for computation and for hosting the science gateway. The objective is for attendees to develop a high-level understanding of the various components and leave with working code that can serve as the starting point for their own science gateway implementation.
An overview of developments in the Globus platform during 2020-2021, presented at a webinar hosted by Internet2. Includes an overview of Globus Connect Server v5, cloud storage connectors, and platform services for developers (e.g., Globus Search and Globus Flows).
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
Major research instruments are generating orders of magnitude more data in relatively short timeframes. As a result, the research enterprise is increasingly challenged by what should be mundane tasks: describing data for discovery and making data securely accessible to the broader research community. The ad hoc methods currently employed place undue burden on scientists and system administrators alike, and it is clear that a more robust, scalable approach is required.
Bespoke data portals (and science gateways/data commons) are becoming more prominent as a means of enabling access to large datasets. in this tutorial we demonstrate how services for authentication, authorization, metadata management, and search may be integrated with popular web frameworks, and used in combination with fast, well-architected networks to make data discoverable and accessible. Outcomes: build a simple, but functional, data portal that facilitates flexible data description, faceted data search and secure data access.
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Giri Prakash from the ARM Data Center at Oak Ridge National Laboratory.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
An overview of installing and configuring a Globus endpoints on your storage system. This tutorial was presented at the GlobusWorld 2021 conference in Chicago, IL by Vas Vasiliadis.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
Screenshots prepared by Ben Blaiszik and Kyle Chard, used in our Globus publication demo at GlobusWorld 2014. See https://www.globus.org/data-publication for more information and the notes on the slides for details.
Log Management
Log Monitoring
Log Analysis
Need for Log Analysis
Problem with Log Analysis
Some of Log Management Tool
What is ELK Stack
ELK Stack Working
Beats
Different Types of Server Logs
Example of Winlog beat, Packetbeat, Apache2 and Nginx Server log analysis
Mimikatz
Malicious File Detection using ELK
Practical Setup
Conclusion
Tutorial presented at Mini Gateways 2022. Demonstrates how to build data portals and science gateways with the Django Globus Portal Framework.
The broad scope of a typical science gateway—to simplify access to shared data, computing and other resources—makes building such a gateway from scratch a daunting task. Investigators must be able to stage data from instruments (or other sources), submit compute jobs to analyze data, move data to more persistent storage, describe data products, and provide a means for collaborators to search, discover, reuse and augment these data products. Myriad tools are available to enable all these tasks but integrating them in a way that hides the complexity from users, is a challenge.
In this tutorial we will describe an approach that bootstraps science gateway development based on the Modern Research Data Portal[1] design pattern. The solution uses a set of open source tools that build on the established Django web framework, the ubiquitous OAuth2/OpenID connect standards for authentication/authorization, the widely deployed Globus service for research data management, and the nascent funcX functions-as-a-service platform. Attendees will learn how to rapidly deploy a science gateway that enables both automated computation at scale and data enhanced discovery of resulting data products. The emphasis will be on automating many of the required tasks so that gateway developers can focus on building differentiated, discipline-specific functionality rather than low-value—yet critical—supporting infrastructure.
We will use the ALCF Community Data Co-Op as an exemplar to illustrate how these tools have been used to support large-scale collaborative research. We will describe the overall solution architecture and introduce attendees to the individual tools. Attendees will then use these tools to deploy and configure their own science gateway to support image analysis, description, indexing and search.
The tutorial will comprise a mix of lectures, demonstration and hands-on exercises. Virtual machines will be provided for computation and for hosting the science gateway. The objective is for attendees to develop a high-level understanding of the various components and leave with working code that can serve as the starting point for their own science gateway implementation.
Building Data Portals and Science Gateways with GlobusGlobus
Presented at GlobusWorld 2022 by the Globus professional services team. Describes the Modern Research Data Portal design pattern and an implementation using the Django framework.
Building RESTfull Data Services with WebAPIGert Drapers
Data services are a major building block inside a service oriented architecture. Not only do they provide the abstraction and isolation between physical storage systems and the business layer, they can also provide the means for: authentication, authorization, transformation, projection, scale (through for example sharding) and caching. This session will walk you through implementing your RESTfull data service so that you can easily enable and integrate the described capabilities
SOLID Programming with Portable Class LibrariesVagif Abilov
Developers often don't pay attention to code portability until they need to target multiple platforms. However, large amount of non-portable code often hints about violation of clean code principles, so it is worth investigating which part of the source code base are platform-specific and for what reasons.
In this session we will give an overview of portable class libraries, show how to extract PCL components from a real-world application and go through typical challenges that are faced when writing portable code. We will present the original tool that analyzes assemblies for portability compliance and can be used as a guard to prevent mixing business logic with infrastructure-specific functionality. Finally we will demonstrate how PCLs help targeting platforms such as Windows Store, Android and iOS.
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Thomas W. Fry
Cerebro: Bringing together data scientists and BI users on a common analytics platform in the cloud
https://conferences.oreilly.com/strata/strata-eu-2019/public/schedule/detail/77861
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo
Watch full webinar here: https://buff.ly/47jH4lk
In this session, Denodo experts will cover a deeper dive into the top 5 differentiated use cases for Denodo by answering any questions since the previous session.
Additionally, we invite partners to bring any general questions related to Denodo, the Denodo Platform, or data management.
Building Research Applications with Globus PaaSGlobus
We provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on building simple web applications for data distribution and discovery. We describe how to register an application with Globus and access platform APIs using the Globus Python SDK and a Jupyter Notebook. We also introduce the Globus Search service and demonstrate how it is used by an open source web portal framework that can jumpstart research application development.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
For our next ArcReady, we will explore a topic on everyone’s mind: Cloud computing. Several industry companies have announced cloud computing services . In October 2008 at the Professional Developers Conference, Microsoft announced the next phase of our Software + Services vision: the Azure Services Platform. The Azure Services Platforms provides a wide range of internet services that can be consumed from both on premises environments or the internet.
Session 1: Cloud Services
In our first session we will explore the current state of cloud services. We will then look at how applications should be architected for the cloud and explore a reference application deployed on Windows Azure. We will also look at the services that can be built for on premise application, using .NET Services. We will also address some of the concerns that enterprises have about cloud services, such as regulatory and compliance issues.
Session 2: The Azure Platform
In our second session we will take a slightly different look at cloud based services by exploring Live Mesh and Live Services. Live Mesh is a data synchronization client that has a rich API to build applications on. Live services are a collection of APIs that can be used to create rich applications for your customers. Live Services are based on internet standard protocols and data formats.
We describe the various Globus APIs and demonstrate how developers can use them to integrate robust data management capabilties into their research applications. We also provide an overview of advanced services such as Globus Search and tools such as the data portal framework that you can use to simplify data search and discovery.
Presented at a workshop at KU Leuven on July 8, 2022.
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
From the Hadoop Summit 2015 Session with Nick Amato.
This session examines practical ways you can begin leveraging network data sources in Hadoop using familiar technologies like SQL and BI tools. Using the diverse sets of sources available, such as traces, routing protocol data, and direct packet captures from critical network locations, we will examine the capabilities of BI tools in the network context and examine cases for extracting value from data collected from the network infrastructure.
Similar to Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus (20)
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
1. Simplifying Science Gateway Data
Management with Globus
Part IV – Automated Data Ingests
October 2020, Gateways 2020
2. Phase 1 - Gather data
Gathering datasets from research partners
• Your project is gathering datasets
from partners. Each dataset is
several TBs and takes ~a day to
transfer over the network.
• For the data to be useful, it needs
descriptive metadata.
• Ultimately, the team needs to find
datasets that match specific
criteria.
3. What are the dataset ingest challenges?
• Getting very large datasets transferred from gateway
users’ systems to the central repository
– (This is Scenario I - large-scale data transfer.)
• Generating persistent identifiers for the data in the
central repository so we can link metadata to data
• Storing the metadata
• Indexing the metadata to enable searching
5. What needs to be in place for it to work?
• Data storage
– Globus Connect Server on Petrel
• Persistent identifiers
– FAIR Research Identifier Service
– Hosted by https://fair-research.org/
• Metadata storage, indexing, search
– Globus Search API
– Hosted by Globus
6. Globus Connect Server on Petrel
• Configured for self-service projects
– Researchers do not receive local (Linux) accounts!
– Uses Globus for authorization & management
• Guest collections and groups
– Project PIs request access by applying to join the “Petrel
Project Owners” group (using the Globus web app)
– Admin creates Globus group, makes PI a group manager
– Admin creates guest collection, makes PI an access manager
– Admin sets a quota of 100TB for the guest collection
7. • RESTful web service, written in Python, that
stores identifier metadata
• Mints (creates) identifiers from external
service providers using a unified service
provider interface (SPI)
• Different identifiers supported through
namespaces
• Client requests served as HTML landing
pages or other machine-readable formats
(e.g., JSON, JSON-LD)
FAIR Research Identifiers
AWS-RDS
AWS-EC2
Postgres
Registration SPI
(Python)
Web Server - REST API
(Apache, Flask, Python)
RDBMS ORM
(SQLAlchemy)
AuthN/AuthZ
(Globus Auth, Globus Groups)
Web
Browser
Client
APIs
HTML JSON, JSON-LD, other
extensible renderings
DataCite
(DOI)
EZID
(ARK)
Minid
(Handle)
https://minid.readthedocs.io/en/develop/
8. • REST API provides a simple CRUD
interface
• Has other capabilities, like finding
identifiers by checksum
• JSON is used for request and
response
• Namespaces may also have their own
handlers, landing pages, and other
customizations.
FAIR Research Identifiers
9. Globus Search API
• RESTful API for indexing & search
– Hosted by Globus (including the metadata &
index storage!)
– Each project gets an “index” object (private
tenancy)
– REST API, Python client package, Python CLI
• https://docs.globus.org/api/search/
10. Globus Search API features
• Scalable: to billions of entries
• Schema agnostic: can use standard
(e.g., DataCite) or custom metadata
• Fine-grain access control: only returns
results that are visible to user
• Plain text search: ranked results
• Faceted search: for data discovery
• Rich query language: ranges,
expressions, regex, fuzzy, stemming, etc.
11. Key ingredients
1. UUID and base path for the guest collection where
data is gathered
2. Minid Python client
3. UUID for Globus Search index
4. Your choice of appropriate metadata schema for
your project’s datasets
You’re working on a project with partners at other institutions, each of whom is analyzing unique samples and generating big datasets from them. You need to gather 100s of TBs of data on your campus’s HPC storage system. How can you make it easy for your partners to get the data from their labs to your server? And once it’s there, how are the partners going to understand each others’ datasets? First, they need to be able to see, in general, what’s been uploaded. Then, they need to find datasets that have specific features.
NOTE: We’re presenting this as a single project, but at Globus, we see this happening for dozens-to-hundreds of research projects on a continuous basis. Our end goal is to enable research teams to do this routinely, without special planning or extraordinary measures by individual projects.
Examples of “analysis on a community dataset”:
Examples of ”analyze user’s data”:
Examples of “download simulation results”:
Examples of “submit data to a repository”:
Petrel Data: https://petreldata.net/
Data storage is provided by the Advanced Leadership Computing Facility at Argonne National Laboratory.
Petrel offers 100TB allocations to approved projects, with a total of 3PB of storage.
Goal is to enable projects to self-manage themselves, including ingest, metadata management, index & search, and sharing permissions.
PIs request access by applying to join a Globus group
Petrel admin creates a project group for the PI and makes the PI a group manager
Petrel admin creates a Globus guest collection with access managed by the PI
Petrel admin also sets a quota of 100TB for the guest collection’s directory.