Introduction to security in the Open Science Grid - OSG School 2014Igor Sfiligoi
Introduction to Grid computing, including PKI-based security. With emphasis on Security in the Open Science Grid context.
Lecture given at the OSG User School 2014
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
Using ssh as portal - The CMS CRAB over glideinWMS experienceIgor Sfiligoi
The User Analysis of the CMS experiment is performed in distributed way using both Grid and dedicated resources. In order to insulate the users from the details of computing fabric, CMS relies on the CRAB (CMS Remote Analysis Builder) package as an abstraction layer. CMS has recently switched from a client-server version of CRAB to a purely client-based solution, with ssh being used to interface with either HTCondor or glideinWMS batch systems. This switch has resulted in significant improvement of user satisfaction, as well as in significant simplification of the CRAB code base. This presentation presents the reasoning behind the change as well as the new experience.
Presented at CHEP2013 in Amsterdam.
Introduction to Distributed HTC and overlay systems - OSG User School 2014Igor Sfiligoi
Lecture about Distributed HTC and overlay systems, given at the OSG User Scool 2014.
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014Igor Sfiligoi
Lecture about glideinWMS, given in DHTC training context of the OSG User School 2014.
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld
VMworld 2013
Anne Holler, VMware
Ganesha Shanmuganathan, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Introduction to security in the Open Science Grid - OSG School 2014Igor Sfiligoi
Introduction to Grid computing, including PKI-based security. With emphasis on Security in the Open Science Grid context.
Lecture given at the OSG User School 2014
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
Using ssh as portal - The CMS CRAB over glideinWMS experienceIgor Sfiligoi
The User Analysis of the CMS experiment is performed in distributed way using both Grid and dedicated resources. In order to insulate the users from the details of computing fabric, CMS relies on the CRAB (CMS Remote Analysis Builder) package as an abstraction layer. CMS has recently switched from a client-server version of CRAB to a purely client-based solution, with ssh being used to interface with either HTCondor or glideinWMS batch systems. This switch has resulted in significant improvement of user satisfaction, as well as in significant simplification of the CRAB code base. This presentation presents the reasoning behind the change as well as the new experience.
Presented at CHEP2013 in Amsterdam.
Introduction to Distributed HTC and overlay systems - OSG User School 2014Igor Sfiligoi
Lecture about Distributed HTC and overlay systems, given at the OSG User Scool 2014.
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014Igor Sfiligoi
Lecture about glideinWMS, given in DHTC training context of the OSG User School 2014.
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld
VMworld 2013
Anne Holler, VMware
Ganesha Shanmuganathan, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Where to find DHTC resources - OSG School 2014Igor Sfiligoi
A lecure about the avaialble resources for Distributed Computing, includings pros and cons.
Given at the OSG USer School 2014:
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
Augmenting Big Data Analytics with NirvanaIgor Sfiligoi
It has been proven that Big Data Analytics can provide major competitive advantage, yet most deployments cannot reach the vast majority of the data owned by an organization!
This presentation introduces the concept of tiered Big Data Analytics , with an eye on the use of Nirvana in this scenario.
Digibury: SciVisum - Making your website fast - and scalableLizzie Hodgson
Deri Jones is a renowned speaker and thought-leader in the Web performance arena. In his Digibury talk he not only covered war-stories from many years in the web performance space, he also gave tips on making any page fast, and explained how to use open-source tools in addressing the challenges of scaling.
OSDC 2018 | Migrating to the cloud by Devdas BhagatNETWAYS
This is an experience report of a migration from self-hosted services to running in the cloud. While there have been plenty of business case studies showing the benefits of a cloud migration, there are very few reports on the IT side of the migration. This talk covers the migration of Spilgames (a small Dutch games publisher) from a self-hosted Openstack and hardware based infrastructure to Google cloud, challenges, tooling (and lack thereof). This migration is still work in progress, and the talk will cover as much detail as possible.
Moving from the Iron Age to the Cloud Age in computing is supposed to save us money yet many migrations seem to cost more in the long run and result in infrastructures as complex to manage as what we had before. This is often the result of the so called “lift & shift” approach many take – it’s a short term win that doesn’t address why you wanted to move to the cloud in the first place.
The Cloud Age affords us the opportunity not to treat our infrastructure as something special, instead as something disposable. By applying the practices of continuous integration and delivery to our infrastructure and configuration management we can built truly scalable infrastructures to host our application’s wildest dreams.
In this talk we will look at the tools and processes that can be adopted to truly make use of the possibilities of the Cloud.
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...C2B2 Consulting
Matt Brasier, C2B2 Head of Consulting speaking at the UK Oracle User Group App Server & Middleware Special Interest Group Event on Wednesday, the 9th of October 2013.
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, CriteoParis Open Source Summit
#Data management & #Blockchain - Track - Data : database
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
Title: Java at Scale - What Works and What Doesn't Work Nearly so Well
Speaker: Matt Schuetze, Product Manager, Azul Systems
Abstract: Java gets used everywhere and for everything due to its efficiency, portability, the productivity it offers developers, and the platform it provides for application frameworks and non-Java languages. But all is not perfect; developers both benefit from and struggle against Java's greatest strength: its memory management. In this session, Matt will describe where Java needs help, the challenges it presents developers who need to provide reliable performance, the reasons those challenges exist, and how developers have traditionally worked around them. He will then discuss where Zing fits in the spectrum of use cases where large memory and predictable performance dominate essential application characteristics.
John Griffith, Block Storage Project PTL, outlines the changes made in the Icehouse release as well as upcoming updates for Juno.
Learn more about Block Storage (Cinder) here: https://wiki.openstack.org/wiki/Cinder
Presentation at the Plone Conference Brazil 2013.
How to create a Plone deployment that performs like crazy and survives not only a datacenter failure, but even keeps on running when all Plone heads are down.
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
Where to find DHTC resources - OSG School 2014Igor Sfiligoi
A lecure about the avaialble resources for Distributed Computing, includings pros and cons.
Given at the OSG USer School 2014:
https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014
Augmenting Big Data Analytics with NirvanaIgor Sfiligoi
It has been proven that Big Data Analytics can provide major competitive advantage, yet most deployments cannot reach the vast majority of the data owned by an organization!
This presentation introduces the concept of tiered Big Data Analytics , with an eye on the use of Nirvana in this scenario.
Digibury: SciVisum - Making your website fast - and scalableLizzie Hodgson
Deri Jones is a renowned speaker and thought-leader in the Web performance arena. In his Digibury talk he not only covered war-stories from many years in the web performance space, he also gave tips on making any page fast, and explained how to use open-source tools in addressing the challenges of scaling.
OSDC 2018 | Migrating to the cloud by Devdas BhagatNETWAYS
This is an experience report of a migration from self-hosted services to running in the cloud. While there have been plenty of business case studies showing the benefits of a cloud migration, there are very few reports on the IT side of the migration. This talk covers the migration of Spilgames (a small Dutch games publisher) from a self-hosted Openstack and hardware based infrastructure to Google cloud, challenges, tooling (and lack thereof). This migration is still work in progress, and the talk will cover as much detail as possible.
Moving from the Iron Age to the Cloud Age in computing is supposed to save us money yet many migrations seem to cost more in the long run and result in infrastructures as complex to manage as what we had before. This is often the result of the so called “lift & shift” approach many take – it’s a short term win that doesn’t address why you wanted to move to the cloud in the first place.
The Cloud Age affords us the opportunity not to treat our infrastructure as something special, instead as something disposable. By applying the practices of continuous integration and delivery to our infrastructure and configuration management we can built truly scalable infrastructures to host our application’s wildest dreams.
In this talk we will look at the tools and processes that can be adopted to truly make use of the possibilities of the Cloud.
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...C2B2 Consulting
Matt Brasier, C2B2 Head of Consulting speaking at the UK Oracle User Group App Server & Middleware Special Interest Group Event on Wednesday, the 9th of October 2013.
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, CriteoParis Open Source Summit
#Data management & #Blockchain - Track - Data : database
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
Title: Java at Scale - What Works and What Doesn't Work Nearly so Well
Speaker: Matt Schuetze, Product Manager, Azul Systems
Abstract: Java gets used everywhere and for everything due to its efficiency, portability, the productivity it offers developers, and the platform it provides for application frameworks and non-Java languages. But all is not perfect; developers both benefit from and struggle against Java's greatest strength: its memory management. In this session, Matt will describe where Java needs help, the challenges it presents developers who need to provide reliable performance, the reasons those challenges exist, and how developers have traditionally worked around them. He will then discuss where Zing fits in the spectrum of use cases where large memory and predictable performance dominate essential application characteristics.
John Griffith, Block Storage Project PTL, outlines the changes made in the Icehouse release as well as upcoming updates for Juno.
Learn more about Block Storage (Cinder) here: https://wiki.openstack.org/wiki/Cinder
Presentation at the Plone Conference Brazil 2013.
How to create a Plone deployment that performs like crazy and survives not only a datacenter failure, but even keeps on running when all Plone heads are down.
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
I recently presented this 2 hours session about the automation model developed in Videobet, the tools used in the R&D, QA and operations:
Issue mgmt.: JIRA/Greenhopper
Build system and repository: Maven & Nexus
Build server: QuickBuild
Code quality: Sonar
Continuous Integration: Selenium Grid
Crash dump analysis: Socorro
Database versioning: Flyway DB
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]Strangeloop
On October 3 at Velocity EU, Strangeloop president Joshua Bixby unveiled the findings from the first study ever conducted of mobile performance over cellular networks.
In July and September 2012, Strangeloop conducted an industry first: a mobile performance survey of top ecommerce sites. The "2012 State of Mobile Ecommerce Performance" documents how Strangeloop tested top Alexa-ranked retail sites on a variety of mobile devices to find answers to questions like:
- How long does the median site take to load in mobile browsers?
- Which sites were fastest?
- Do some mobile OS/browsers/devices offer a consistently faster user experience than others?
- How much faster are pages served over LTE than over 3G?
- How do all of these findings compare to similar research conducted for desktop performance, published in Strangeloop’s annual Page Speed and Website Performance State of the Union reports?
The report is available for download at http://www.strangeloopnetworks.com/.
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power ToolsSaltStack
As infrastructure scales, simple tasks become increasingly difficult. For large infrastructures to be manageable, we use automation. But automation, like any power tool, comes with its own set of risks and challenges. Automation should be handled like production code, and great care should be exercised with power tools. This talk will cover how SaltStack is used at LinkedIn and offer tips and tricks for automating management with SaltStack at massive scale including a look at LinkedIn-inspired Salt features such as blacklist and prereq states. It will also cover Salt master and minion instrumentation and a compilation of how not to use Salt.
Similar to How is glideinWMS different from vanilla HTCondor (20)
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now allowing for some problems that once required a multi-node setup to be also solvable on a single node. When possible, the increased interconnect bandwidth can result in order of magnitude higher science throughput, especially for communication-heavy applications. In this paper we analyze the performance of the fusion simulation tool CGYRO, an Eulerian gyrokinetic turbulence solver designed and optimized for collisional, electromagnetic, multiscale simulation, which is widely used in the fusion research community. Due to the nature of the problem, the application has to work on a large multi-dimensional computational mesh as a whole, requiring frequent exchange of large amounts of data between the compute processes. In particular, we show that the average-scale nl03 benchmark CGYRO simulation can be run at an acceptable speed on a single Google Cloud instance with 16 A100 GPUs, outperforming 8 NERSC Perlmutter Phase1 nodes, 16 ORNL Summit nodes and 256 NERSC Cori nodes. Moving from a multi-node to a single-node GPU setup we get comparable simulation times using less than half the number of GPUs. Larger benchmark problems, however, still require a multi-node HPC setup due to GPU memory capacity needs, since at the time of writing no vendor offers nodes with a sufficient GPU memory setup. The upcoming external NVSWITCH does however promise to deliver an almost equivalent solution for up to 256 NVIDIA GPUs.
Presented at PEARC22.
Paper DOI: https://doi.org/10.1145/3491418.3535130
The anachronism of whole-GPU accountingIgor Sfiligoi
NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover, for applications that require significant CPU-based compute to complement the GPU-based compute, it is becoming harder and harder to make full use of the newer GPUs, requiring sharing of those GPUs between multiple applications in order to maximize the achievable science output. This further reduces the value of whole-GPU accounting, especially when the sharing is done at the infrastructure level. We thus argue that GPU accounting for throughput-oriented infrastructures should be expressed in GPU core hours, much like it is normally done for the CPUs. While GPU core compute throughput does change between GPU generations, the variability is similar to what we expect to see among CPU cores. To validate our position, we present an extensive set of run time measurements of two IceCube photon propagation workflows on 14 GPU models, using both on-prem and Cloud resources. The measurements also outline the influence of GPU sharing at both HTCondor and Kubernetes infrastructure level.
Presented at PEARC22.
Document DOI: https://doi.org/10.1145/3491418.3535125
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, but it has very limited native support for autonomously provisioning resources managed by other solutions. This work presents a solution that allows for autonomous, demand-driven provisioning of Kubernetes-managed resources. A high-level overview of the employed architectures is presented, paired with the description of the setups used in both on-prem and Cloud deployments in support of several Open Science Grid communities. The experience suggests that the described solution should be generally suitable for contributing Kubernetes-based resources to existing HTCondor pools.
Presented at PEARC22.
Paper DOI: https://doi.org/10.1145/3491418.3535123
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
Overview of the recent performance optimization of CGYRO, an Eulerian GyroKinetic Fusion Plasma solver, with emphasize on the Multiscale Turbulence Simulations.
Presented at the joint US-Japan Workshop on Exascale Computing Collaboration and6th workshop of US-Japan Joint Institute for Fusion Theory (JIFT) program (Jan 18th 2022).
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
Poster presented at PEAC21.
The poster contains the complete scaling plots for both unweighted and weighted normalized Unifrac compute for sample sizes ranging from 1k to 307k on both GPUs and CPUs.
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
Presented at PEARC21.
Many scientific high-throughput applications can benefit from the elastic nature of Cloud resources, especially when there is a need to reduce time to completion. Cost considerations are usually a major issue in such endeavors, with networking often a major component; for data-intensive applications, egress networking costs can exceed the compute costs. Dedicated network links provide a way to lower the networking costs, but they do add complexity. In this paper we provide a description of a 100 fp32 PFLOPS Cloud burst in support of IceCube production compute, that used Internet2 Cloud Connect service to provision several logically-dedicated network links from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and Google Cloud Platform, that in aggregate enabled approximately 100 Gbps egress capability to on-prem storage. It provides technical details about the provisioning process, the benefits and limitations of such a setup and an analysis of the costs incurred.
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
Presented at PEARC21.
Most experimental sciences now rely on computing, and biolog- ical sciences are no exception. As datasets get bigger, so do the computing costs, making proper optimization of the codes used by scientists increasingly important. Many of the codes developed in recent years are based on the Python-based NumPy, due to its ease of use and good performance characteristics. The composable nature of NumPy, however, does not generally play well with the multi-tier nature of modern CPUs, making any non-trivial multi- step algorithm limited by the external memory access speeds, which are hundreds of times slower than the CPU’s compute capabilities. In order to fully utilize the CPU compute capabilities, one must keep the working memory footprint small enough to fit in the CPU caches, which requires splitting the problem into smaller portions and fusing together as many steps as possible. In this paper, we present changes based on these principles to two important func- tions in the scikit-bio library, principal coordinates analysis and the Mantel test, that resulted in over 100x speed improvement in these widely used, general-purpose tools.
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
Presented at GTC21.
The raw computing power of GPUs has been steadily increasing, significantly outpacing the CPU gains. This poses a problem for many GPU-enabled scientific applications that use CPU code paths to feed data to the GPU code, resulting in lower GPU utilization, and thus reduced gains in scientific output. Applications that are high-throughput in nature, such as astronomy-focused IceCube and LIGO, can partially work around the problem by running several instances of the executable on the same GPU. This approach, however, is sub-optimal both in terms of application performance and workflow management complexity. The recently introduced Multi-Instance GPU (MIG) capability, available on the NVIDIA A100 GPU, provides a much cleaner and easier-to-use alternative by allowing the logical slicing of the powerful GPU and assigning different slices to different applications. And at least in the case of IceCube, it can provide over 3x more scientific output on the same hardware.
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
Presented at EDUCAUSE CCCG March 2021.
The IceCube Neutrino Observatory is the world’s premier facility to detect neutrinos.
Built at the south pole in natural ice, it requires extensive and expensive calibration to properly track the neutrinos.
Most of the required compute power comes from on-prem resources through the Open Science Grid,
but IceCube can easily harness the Cloud compute at any scale, too, as demonstrated by a series of Cloud bursts.
This talk provides both details of the performed Cloud bursts, as well as some insight in the science itself.
Fusion simulations have traditionally required the use of leadership scale HPC resources in order to produce advances in physics. One such package is CGYRO, a premier tool for multi-scale plasma turbulence simulation. CGYRO is a typical HPC application that will not fit into a single node, as it requires several TeraBytes of memory and O(100) TFLOPS compute capability for cutting-edge simulations. CGYRO also requires high-throughput and low-latency networking, due to its reliance on global FFT computations. While in the past such compute may have required hundreds, or even thousands of nodes, recent advances in hardware capabilities allow for just tens of nodes to deliver the necessary compute power. We explored the feasibility of running CGYRO on Cloud resources provided by Microsoft on their Azure platform, using the infiniband-connected HPC resources in spot mode. We observed both that CPU-only resources were very efficient, and that running in spot mode was doable, with minimal side effects. The GPU-enabled resources were less cost effective but allowed for higher scaling.
For IceCube, large amount of photon propagation simulation is needed to properly calibrate natural Ice. Simulation is compute intensive and ideal for GPU compute. This Cloud run was more data intensive than precious ones, producing 130 TB of output data. To keep egress costs in check, we created dedicated network links via the Internet2 Cloud Connect Service.
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
Presented at OSG All-Hands Meeting 2020 - USCMS-USATLAS Session.
This talk presented the PRP experience with using Admiralty as a Kubernetes federation solution, with both discussion of why we need it, why Admiralty is the best (if not actually the only) solution for our needs, and how it works.
Accelerating microbiome research with OpenACCIgor Sfiligoi
Presented at OpenACC Summit 2020.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another. Computing UniFrac on modest sample sizes used to take a workday on a server class CPU-only node, while modern datasets would require a large compute cluster to be feasible. After porting to GPUs using OpenACC, the compute of the same modest sample size now takes only a few minutes on a single NVIDIA V100 GPU, while modern datasets can be processed on a single GPU in hours. The OpenACC programming model made the porting of the code to GPUs extremely simple; the first prototype was completed in just over a day. Getting full performance did however take much longer, since proper memory access is fundamental for this application.
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
Poster presented at PEARC20.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another (“beta diversity”). The recently implemented Striped UniFrac added the capability to split the problem into many independent subproblems and exhibits near linear scaling. In this poster we describe steps undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset containing 113k samples reduced the run time from over one month on the CPU to less than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in precision). This was achieved by using OpenACC for generating the GPU offload code and by improving the memory access patterns. A BSD-licensed implementation is available, which produces a Cshared library linkable by any programming language.
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
Poster presented at PEARC20.
There is increased awareness and recognition that public Cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver. The value of elastic scaling is however tightly coupled to the capabilities of the networks that connect all involved resources, both in the public Clouds and at the various research institutions. This poster presents results of measurements involving file transfers inside public Cloud providers, fetching data from on-prem resources into public Cloud instances and fetching data from public Cloud storage into on-prem nodes. The networking of the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform, has been benchmarked. The on-prem nodes were managed by either the Pacific Research Platform or located at the University of Wisconsin – Madison. The observed sustained throughput was of the order of 100 Gbps in all the tests moving data in and out of the public Clouds and throughput reaching into the Tbps range for data movements inside the public Cloud providers themselves. All the tests used HTTP as the transfer protocol.
TransAtlantic Networking using Cloud linksIgor Sfiligoi
Scientific communities have only limited amount of bandwidth available for transferring data between the US and the EU.
We know Cloud providers have plenty of bandwidth available, but at what cost?
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
How is glideinWMS different from vanilla HTCondor
1. Aug 2014 How are glideins different 1
glideinWMS Training
How is glideinWMS different
from vanilla HTCondor
by Igor Sfiligoi, UC San Diego
2. Aug 2014 How are glideins different 2
Overview
● These slides provide an overview of why
glideinWMS installations behave differently
than dedicated, LAN-based HTCondor ones
3. Aug 2014 How are glideins different 3
Very heterogeneous res. pool
● Many user jobs have data constraints
– And data access varies from site to site
● Each site basically results in a different
“type of resource”
– Making the resources very heterogeneous,
for matchmaking purposes
– O(100) types of resources not unusual
● Leads to autocluster
number explosion
In dedicated HTCondor
pools, 5 classes of resources
is typically already a lot
4. Aug 2014 How are glideins different 4
Provisioning vs matchmaking
● Glideins are provisioned
(i.e. requested from sites)
because some user jobs need more resources
– But once provisioned they may not match any jobs
● Two main reasons
– Trigger jobs already gone (i.e. not idle anymore)
– Mismatch between provisioning and
matchmaking requirements Dedicated HTCondor
installations don't
have 2 levels of
matchmaking
5. Aug 2014 How are glideins different 5
Limited lease lifetime
● Glideins are basically leased execute nodes
– And they come with a limited lifetime
● Lease times usually in the order of one day
– Each glidein typically runs less than 10 user jobs
● User jobs must fit in the remaining lifetime
– Or they will be killed
● Makes for more complex matchmaking decisions
– And requires user help
6. Aug 2014 How are glideins different 6
Multicore and limited lifetimes
● Limited lifetimes particularly problematic for
multi-core jobs, resulting in significant waste
– Since it is unlikely all jobs will terminate
at exactly the same time
job3
job2
job1
job5
job6
job8
job4 job7 job9
WASTE
1
2
3
4
CPU
time
No suitable user jobs anymore
Pilot job
can terminate
7. Aug 2014 How are glideins different 7
Automatic shut down
● Glideins are configured to shut down
automatically if not used for some time
– Those resources could be used by someone else
– HTCondor not the only user of the resources
● Default Unclaimed threshold quite low
– About 10 minutes
● This puts stringent limits on Matchmaking
– If a Startd is not matched in time, it is “lost”
– And restarting glideins is expensive
Unlike a dedicated HTCondor pool
8. Aug 2014 How are glideins different 8
Strong end-to-end security
● A glideinWMS system will typically span many
different locations
● x509 authentication between all nodes required
– At daemon startup, then sec. session cached
– With the exception of Schedd<->Startd, where
security mediated through the Collector
● All over-the-wire communication
Integrity checked
– Requires auth. Neither typically used
in LAN deployments
9. Aug 2014 How are glideins different 9
Not privileged on execute side
● HTCondor daemons on the execute side
do not have system privileges
– Limits what HTCondor can do
● UID switching can be achieved with glexec
– But requires proxy delegation from schedd
– Only possible if users collaborate
– Relatively expensive (at least one per job startup)
● Many other functions not an option
– e.g. cgroups
10. Aug 2014 How are glideins different 10
Firewalls
● HTCondor basically a P2P system
– But execute nodes are often behind firewalls
● Requires the use of CCB and
shared_port_daemon to get around it
– But this adds complexity to the system
– Schedd particularly sensible here
● CCB can become single point of failure
– Either because temp. overloaded
– Or if it dies and HA not used
11. Aug 2014 How are glideins different 11
Very dynamic resource pool
● Startds tend to come and go often
– A side effect of limited lease lifetime
– And provisioning due to new jobs being submitted
● Many HTCondor optimizations less effective
– e.g. Security session caching
12. Aug 2014 How are glideins different 12
Increased resource pool size
● Most glideinWMS installations bigger than
most LAN HTCondor installations
– At least at the peaks
● Increased scale puts more load on non-execute
daemons
– Even before all the other considerations are applied