glideinWMS Frontend Internals - glideinWMS Training Jan 2012Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS Frontend. Part of the glideinWMS Training session held in Jan 2012 at UCSD.
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS glidein startup script and the glideins in general . Part of the glideinWMS Training session held in Jan 2012 at UCSD.
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012Igor Sfiligoi
This talk walks you through the monitoring options a glideinWMS Frontend operator has.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
Lean Software Production and Qualification InfrastructuresAdaCore
Florian Villoing presents the infrastructure AdaCore put in place to build and tests its compilation tool chains and add-on technology on a daily basis.
He also present the "qualification machine" that AdaCore created to ease the DO-178B tool qualification process.
These slides were used to support the talks Florian gave at the Agile Tour 2009 conferences in Grenoble (October 20, 2009) and Valence (October 22, 2009).
glideinWMS Training Jan 2012 - Condor tuningIgor Sfiligoi
This talk walks you through the various knobs that need to be tuned to get Condor work with glideinWMS.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
glideinWMS Frontend Internals - glideinWMS Training Jan 2012Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS Frontend. Part of the glideinWMS Training session held in Jan 2012 at UCSD.
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS glidein startup script and the glideins in general . Part of the glideinWMS Training session held in Jan 2012 at UCSD.
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012Igor Sfiligoi
This talk walks you through the monitoring options a glideinWMS Frontend operator has.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
Lean Software Production and Qualification InfrastructuresAdaCore
Florian Villoing presents the infrastructure AdaCore put in place to build and tests its compilation tool chains and add-on technology on a daily basis.
He also present the "qualification machine" that AdaCore created to ease the DO-178B tool qualification process.
These slides were used to support the talks Florian gave at the Agile Tour 2009 conferences in Grenoble (October 20, 2009) and Valence (October 22, 2009).
glideinWMS Training Jan 2012 - Condor tuningIgor Sfiligoi
This talk walks you through the various knobs that need to be tuned to get Condor work with glideinWMS.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
Descripton of how to write custom validation scripts in glideinWMS, with an emphasis on the VO Frontend operations.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
This slide show illustrates an preliminary work on the "pilot mechanism" using the Condor system. The goal is to create a uniform user interface to the computational resource from across the network and in the meantime, increase the parallelism of user tasks towards optimal throughput in the long run.
Wedding convenience and control with RemoteCondorIgor Sfiligoi
This presentation explains why Condor is not suitable for use on user-owned machines, and why RemoteCondor is the best available solution to the problem.
The glideinWMS approach to the ownership of System Images in the Cloud WorldIgor Sfiligoi
Presentation at CLOSER 2012.
Scientific communities that are accustomed to use Grid resources are now considering the use of Cloud resources. However, moving from the Grid to the Cloud brings along the need for the creation and maintenance of the system image used to configure the provisioned resources, and this presents both opportunities and problems for the users. The impact is especially interesting in the context of glideinWMS due to its layered architecture. This presentation describes the various options available to the glideinWMS project team, their advantages and disadvantages, and explains why one of them is to be preferred.
Closer web page: http://closer.scitevents.org/
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
The overwhelming growth of technologies in the Cloud Native foundation overtook our toolbox and completely changed (well, really enhanced) the Developer Experience.
In this talk, I will try to provide my personal journey from the "Operator to Developer's chair" and the practices which helped me along my journey as a Cloud-Native Dev ;)
An argument for moving the requirements out of user hands - The CMS ExperienceIgor Sfiligoi
This talks makes an argument why users should not write arbitrary Requirements expressions in Condor, but only express the desires, and let someone else write the actual policy.
Presentation at Condor Week 2012.
Condor overview - glideinWMS Training Jan 2012Igor Sfiligoi
An overview of the Condor Workload Management System, with emphasis on how it is used within the glideinWMS.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
The event page is http://hepuser.ucsd.edu/twiki2/bin/view/Main/GlideinFrontend1201
Video available at http://www.youtube.com/watch?v=tpaedg09VMM
In this presentation, you will learn how to build a Quick Deploy XML for Deployment Center (DC). And to validate your XML using an XSD to reduce the number of iterations. Ensuring you have no errors, find syntax errors, or even identify a severe error, usually with a recommendation on how to fix it!
- A Quick Deploy XML can be used to create and install a new environment in DC.
- Validate the XML using an XSD before running it.
- Reduce the number of iterations while building the XML.
A complete method.
L’evoluzione delle pratiche di sviluppo, delle architetture e delle infrastrutture è un processo che anche Drupal ha abbracciato, trasformandosi da un CMS per community a un framework PHP moderno.
Drupal oggi permette di creare un'esperienza developer-friendly e può essere la base su cui costruire la vostra applicazione cloud-native.
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
(Big Data with Hadoop & Spark Training: http://bit.ly/2IUsWca
This CloudxLab Running in a Cluster tutorial helps you to understand running Spark in the cluster in detail. Below are the topics covered in this tutorial:
1) Spark Runtime Architecture
2) Driver Node
3) Scheduling Tasks on Executors
4) Understanding the Architecture
5) Cluster Managers
6) Executors
7) Launching a Program using spark-submit
8) Local Mode & Cluster-Mode
9) Installing Standalone Cluster
10) Cluster Mode - YARN
11) Launching a Program on YARN
12) Cluster Mode - Mesos and AWS EC2
13) Deployment Modes - Client and Cluster
14) Which Cluster Manager to Use?
15) Common flags for spark-submit
Ever since the “CloudNative revolution” took over our development environment (devenv), we have never been more challenged (or more excited). With Kubernetes, Docker (Containerd) & many other microservice-related technologies, we have a handful of technologies to master before we write the first line of code.
This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
Descripton of how to write custom validation scripts in glideinWMS, with an emphasis on the VO Frontend operations.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
This slide show illustrates an preliminary work on the "pilot mechanism" using the Condor system. The goal is to create a uniform user interface to the computational resource from across the network and in the meantime, increase the parallelism of user tasks towards optimal throughput in the long run.
Wedding convenience and control with RemoteCondorIgor Sfiligoi
This presentation explains why Condor is not suitable for use on user-owned machines, and why RemoteCondor is the best available solution to the problem.
The glideinWMS approach to the ownership of System Images in the Cloud WorldIgor Sfiligoi
Presentation at CLOSER 2012.
Scientific communities that are accustomed to use Grid resources are now considering the use of Cloud resources. However, moving from the Grid to the Cloud brings along the need for the creation and maintenance of the system image used to configure the provisioned resources, and this presents both opportunities and problems for the users. The impact is especially interesting in the context of glideinWMS due to its layered architecture. This presentation describes the various options available to the glideinWMS project team, their advantages and disadvantages, and explains why one of them is to be preferred.
Closer web page: http://closer.scitevents.org/
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
The overwhelming growth of technologies in the Cloud Native foundation overtook our toolbox and completely changed (well, really enhanced) the Developer Experience.
In this talk, I will try to provide my personal journey from the "Operator to Developer's chair" and the practices which helped me along my journey as a Cloud-Native Dev ;)
An argument for moving the requirements out of user hands - The CMS ExperienceIgor Sfiligoi
This talks makes an argument why users should not write arbitrary Requirements expressions in Condor, but only express the desires, and let someone else write the actual policy.
Presentation at Condor Week 2012.
Condor overview - glideinWMS Training Jan 2012Igor Sfiligoi
An overview of the Condor Workload Management System, with emphasis on how it is used within the glideinWMS.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
The event page is http://hepuser.ucsd.edu/twiki2/bin/view/Main/GlideinFrontend1201
Video available at http://www.youtube.com/watch?v=tpaedg09VMM
In this presentation, you will learn how to build a Quick Deploy XML for Deployment Center (DC). And to validate your XML using an XSD to reduce the number of iterations. Ensuring you have no errors, find syntax errors, or even identify a severe error, usually with a recommendation on how to fix it!
- A Quick Deploy XML can be used to create and install a new environment in DC.
- Validate the XML using an XSD before running it.
- Reduce the number of iterations while building the XML.
A complete method.
L’evoluzione delle pratiche di sviluppo, delle architetture e delle infrastrutture è un processo che anche Drupal ha abbracciato, trasformandosi da un CMS per community a un framework PHP moderno.
Drupal oggi permette di creare un'esperienza developer-friendly e può essere la base su cui costruire la vostra applicazione cloud-native.
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
(Big Data with Hadoop & Spark Training: http://bit.ly/2IUsWca
This CloudxLab Running in a Cluster tutorial helps you to understand running Spark in the cluster in detail. Below are the topics covered in this tutorial:
1) Spark Runtime Architecture
2) Driver Node
3) Scheduling Tasks on Executors
4) Understanding the Architecture
5) Cluster Managers
6) Executors
7) Launching a Program using spark-submit
8) Local Mode & Cluster-Mode
9) Installing Standalone Cluster
10) Cluster Mode - YARN
11) Launching a Program on YARN
12) Cluster Mode - Mesos and AWS EC2
13) Deployment Modes - Client and Cluster
14) Which Cluster Manager to Use?
15) Common flags for spark-submit
Ever since the “CloudNative revolution” took over our development environment (devenv), we have never been more challenged (or more excited). With Kubernetes, Docker (Containerd) & many other microservice-related technologies, we have a handful of technologies to master before we write the first line of code.
Similar to glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS Training Jan 2012 (20)
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now allowing for some problems that once required a multi-node setup to be also solvable on a single node. When possible, the increased interconnect bandwidth can result in order of magnitude higher science throughput, especially for communication-heavy applications. In this paper we analyze the performance of the fusion simulation tool CGYRO, an Eulerian gyrokinetic turbulence solver designed and optimized for collisional, electromagnetic, multiscale simulation, which is widely used in the fusion research community. Due to the nature of the problem, the application has to work on a large multi-dimensional computational mesh as a whole, requiring frequent exchange of large amounts of data between the compute processes. In particular, we show that the average-scale nl03 benchmark CGYRO simulation can be run at an acceptable speed on a single Google Cloud instance with 16 A100 GPUs, outperforming 8 NERSC Perlmutter Phase1 nodes, 16 ORNL Summit nodes and 256 NERSC Cori nodes. Moving from a multi-node to a single-node GPU setup we get comparable simulation times using less than half the number of GPUs. Larger benchmark problems, however, still require a multi-node HPC setup due to GPU memory capacity needs, since at the time of writing no vendor offers nodes with a sufficient GPU memory setup. The upcoming external NVSWITCH does however promise to deliver an almost equivalent solution for up to 256 NVIDIA GPUs.
Presented at PEARC22.
Paper DOI: https://doi.org/10.1145/3491418.3535130
The anachronism of whole-GPU accountingIgor Sfiligoi
NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover, for applications that require significant CPU-based compute to complement the GPU-based compute, it is becoming harder and harder to make full use of the newer GPUs, requiring sharing of those GPUs between multiple applications in order to maximize the achievable science output. This further reduces the value of whole-GPU accounting, especially when the sharing is done at the infrastructure level. We thus argue that GPU accounting for throughput-oriented infrastructures should be expressed in GPU core hours, much like it is normally done for the CPUs. While GPU core compute throughput does change between GPU generations, the variability is similar to what we expect to see among CPU cores. To validate our position, we present an extensive set of run time measurements of two IceCube photon propagation workflows on 14 GPU models, using both on-prem and Cloud resources. The measurements also outline the influence of GPU sharing at both HTCondor and Kubernetes infrastructure level.
Presented at PEARC22.
Document DOI: https://doi.org/10.1145/3491418.3535125
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, but it has very limited native support for autonomously provisioning resources managed by other solutions. This work presents a solution that allows for autonomous, demand-driven provisioning of Kubernetes-managed resources. A high-level overview of the employed architectures is presented, paired with the description of the setups used in both on-prem and Cloud deployments in support of several Open Science Grid communities. The experience suggests that the described solution should be generally suitable for contributing Kubernetes-based resources to existing HTCondor pools.
Presented at PEARC22.
Paper DOI: https://doi.org/10.1145/3491418.3535123
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
Overview of the recent performance optimization of CGYRO, an Eulerian GyroKinetic Fusion Plasma solver, with emphasize on the Multiscale Turbulence Simulations.
Presented at the joint US-Japan Workshop on Exascale Computing Collaboration and6th workshop of US-Japan Joint Institute for Fusion Theory (JIFT) program (Jan 18th 2022).
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
Poster presented at PEAC21.
The poster contains the complete scaling plots for both unweighted and weighted normalized Unifrac compute for sample sizes ranging from 1k to 307k on both GPUs and CPUs.
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
Presented at PEARC21.
Many scientific high-throughput applications can benefit from the elastic nature of Cloud resources, especially when there is a need to reduce time to completion. Cost considerations are usually a major issue in such endeavors, with networking often a major component; for data-intensive applications, egress networking costs can exceed the compute costs. Dedicated network links provide a way to lower the networking costs, but they do add complexity. In this paper we provide a description of a 100 fp32 PFLOPS Cloud burst in support of IceCube production compute, that used Internet2 Cloud Connect service to provision several logically-dedicated network links from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and Google Cloud Platform, that in aggregate enabled approximately 100 Gbps egress capability to on-prem storage. It provides technical details about the provisioning process, the benefits and limitations of such a setup and an analysis of the costs incurred.
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
Presented at PEARC21.
Most experimental sciences now rely on computing, and biolog- ical sciences are no exception. As datasets get bigger, so do the computing costs, making proper optimization of the codes used by scientists increasingly important. Many of the codes developed in recent years are based on the Python-based NumPy, due to its ease of use and good performance characteristics. The composable nature of NumPy, however, does not generally play well with the multi-tier nature of modern CPUs, making any non-trivial multi- step algorithm limited by the external memory access speeds, which are hundreds of times slower than the CPU’s compute capabilities. In order to fully utilize the CPU compute capabilities, one must keep the working memory footprint small enough to fit in the CPU caches, which requires splitting the problem into smaller portions and fusing together as many steps as possible. In this paper, we present changes based on these principles to two important func- tions in the scikit-bio library, principal coordinates analysis and the Mantel test, that resulted in over 100x speed improvement in these widely used, general-purpose tools.
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
Presented at GTC21.
The raw computing power of GPUs has been steadily increasing, significantly outpacing the CPU gains. This poses a problem for many GPU-enabled scientific applications that use CPU code paths to feed data to the GPU code, resulting in lower GPU utilization, and thus reduced gains in scientific output. Applications that are high-throughput in nature, such as astronomy-focused IceCube and LIGO, can partially work around the problem by running several instances of the executable on the same GPU. This approach, however, is sub-optimal both in terms of application performance and workflow management complexity. The recently introduced Multi-Instance GPU (MIG) capability, available on the NVIDIA A100 GPU, provides a much cleaner and easier-to-use alternative by allowing the logical slicing of the powerful GPU and assigning different slices to different applications. And at least in the case of IceCube, it can provide over 3x more scientific output on the same hardware.
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
Presented at EDUCAUSE CCCG March 2021.
The IceCube Neutrino Observatory is the world’s premier facility to detect neutrinos.
Built at the south pole in natural ice, it requires extensive and expensive calibration to properly track the neutrinos.
Most of the required compute power comes from on-prem resources through the Open Science Grid,
but IceCube can easily harness the Cloud compute at any scale, too, as demonstrated by a series of Cloud bursts.
This talk provides both details of the performed Cloud bursts, as well as some insight in the science itself.
Fusion simulations have traditionally required the use of leadership scale HPC resources in order to produce advances in physics. One such package is CGYRO, a premier tool for multi-scale plasma turbulence simulation. CGYRO is a typical HPC application that will not fit into a single node, as it requires several TeraBytes of memory and O(100) TFLOPS compute capability for cutting-edge simulations. CGYRO also requires high-throughput and low-latency networking, due to its reliance on global FFT computations. While in the past such compute may have required hundreds, or even thousands of nodes, recent advances in hardware capabilities allow for just tens of nodes to deliver the necessary compute power. We explored the feasibility of running CGYRO on Cloud resources provided by Microsoft on their Azure platform, using the infiniband-connected HPC resources in spot mode. We observed both that CPU-only resources were very efficient, and that running in spot mode was doable, with minimal side effects. The GPU-enabled resources were less cost effective but allowed for higher scaling.
For IceCube, large amount of photon propagation simulation is needed to properly calibrate natural Ice. Simulation is compute intensive and ideal for GPU compute. This Cloud run was more data intensive than precious ones, producing 130 TB of output data. To keep egress costs in check, we created dedicated network links via the Internet2 Cloud Connect Service.
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
Presented at OSG All-Hands Meeting 2020 - USCMS-USATLAS Session.
This talk presented the PRP experience with using Admiralty as a Kubernetes federation solution, with both discussion of why we need it, why Admiralty is the best (if not actually the only) solution for our needs, and how it works.
Accelerating microbiome research with OpenACCIgor Sfiligoi
Presented at OpenACC Summit 2020.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another. Computing UniFrac on modest sample sizes used to take a workday on a server class CPU-only node, while modern datasets would require a large compute cluster to be feasible. After porting to GPUs using OpenACC, the compute of the same modest sample size now takes only a few minutes on a single NVIDIA V100 GPU, while modern datasets can be processed on a single GPU in hours. The OpenACC programming model made the porting of the code to GPUs extremely simple; the first prototype was completed in just over a day. Getting full performance did however take much longer, since proper memory access is fundamental for this application.
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
Poster presented at PEARC20.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another (“beta diversity”). The recently implemented Striped UniFrac added the capability to split the problem into many independent subproblems and exhibits near linear scaling. In this poster we describe steps undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset containing 113k samples reduced the run time from over one month on the CPU to less than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in precision). This was achieved by using OpenACC for generating the GPU offload code and by improving the memory access patterns. A BSD-licensed implementation is available, which produces a Cshared library linkable by any programming language.
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
Poster presented at PEARC20.
There is increased awareness and recognition that public Cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver. The value of elastic scaling is however tightly coupled to the capabilities of the networks that connect all involved resources, both in the public Clouds and at the various research institutions. This poster presents results of measurements involving file transfers inside public Cloud providers, fetching data from on-prem resources into public Cloud instances and fetching data from public Cloud storage into on-prem nodes. The networking of the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform, has been benchmarked. The on-prem nodes were managed by either the Pacific Research Platform or located at the University of Wisconsin – Madison. The observed sustained throughput was of the order of 100 Gbps in all the tests moving data in and out of the public Clouds and throughput reaching into the Tbps range for data movements inside the public Cloud providers themselves. All the tests used HTTP as the transfer protocol.
TransAtlantic Networking using Cloud linksIgor Sfiligoi
Scientific communities have only limited amount of bandwidth available for transferring data between the US and the EU.
We know Cloud providers have plenty of bandwidth available, but at what cost?
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
By Design, not by Accident - Agile Venture Bolzano 2024
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS Training Jan 2012
1. glideinWMS Training @ UCSD
glideinWMS Frontend
Installation
Part 1 – Condor Installation
by Igor Sfiligoi (UCSD)
UCSD Jan 17th 2012 Condor Install 1
2. Overview
● Introduction
● Planning and Common setup
● Central Manager Installation
● Submit node Installation
UCSD Jan 17th 2012 Condor Install 2
3. Refresher - Glideins
● A glidein is just a properly configured Condor
execution node submitted as a Grid job
● glideinWMS Central manager
provides glidein
Execution node
Collector CREAM
automation glidein
Execution node
Negotiator
Submit node
Submit node
glidein
Execution node
Submit node
Execution node
glidein
Schedd Startd
Globus
Job
glideinWMS
UCSD Jan 17th 2012 Condor Install 3
4. Refresher - Glideins
● The glideinWMS triggers glidein submission
● The “regular” negotiator matches jobs to glideins
Central manager
glidein
Execution node
Collector CREAM
glidein
Execution node
Negotiator
Submit node
Submit node
glidein
Execution node
Submit node
Execution node
glidein
Schedd Startd
Globus
Job
glideinWMS
UCSD Jan 17th 2012 Condor Install 4
5. Bottom line
Condor is king!
(glideinWMS just a small layer on top)
UCSD Jan 17th 2012 Condor Install 5
6. Condor installation
● Proper Condor installation and configuration
the most important task
● Condor will do most of the work
● … and is thus the most resource hungry
● GlideinWMS installation almost an afterthought
● Although it does require proper
security config of Condor
● GlideinWMS installation proper will be described
in a separate talk
UCSD Jan 17th 2012 Condor Install 6
7. Planning
and
Common setup
UCSD Jan 17th 2012 Condor Install 7
8. Refresher - Condor
● Two main node types
● Submit node(s)
● Central manager Central manager
● (execute nodes are dynamic – glideins) Collector
● Public TCP/IP Submit node
Submit node
Submit node
Negotiator
networking needed
Schedd
● GSI used for
network security
glidein
UCSD Jan 17th 2012 Condor Install 8
9. Planning the setup
● In theory, all Condor daemons can be installed
on a single node
● However, if at all possible, put
Central Manager on a dedicated node
● i.e. do not use it as a submit node, too
● Both for security and stability reasons
● You may want/need more than one submit node
● Depends on expected use and available HW
● You do need at least one, though
UCSD Jan 17th 2012 Condor Install 9
10. Common system considerations
● Condor is supported on a wide variety of platforms
● Including Linux (e.g. RHEL5), MacOS and Windows
● Linux recommended in OSG (and assumed in the rest of talk)
● GSI security requires
● Host or service certificate
● CAs & CRLs
– Typically delivered via OSG RPMs (but other means acceptable)
https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallCertAuth
● Full Grid Client software recommended (for ease of ops)
https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallOSGClient
UCSD Jan 17th 2012 Condor Install 10
11. OSG Grid Client
● Requires RHEL5-compatible Linux
● RHEL6 support promised for early 2012
● Procedure in a nutshell
● Add EPEL and OSG RPM repositories to sys conf.
● yum install osg-ca-certs
● yum install osg-client
Other Grid clients
● Enable CRL fetching crontab (e.g. EGI/glite)
will work just as well
https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallOSGClient
UCSD Jan 17th 2012 Condor Install 11
12. Requesting a host certificate
● OSG provides a script to talk to DOEGrids
https://twiki.grid.iu.edu/bin/view/Documentation/Release3/GetHostServiceCertificates
● Procedure in a nutshell
● Install OSG client
● yum install osg-cert-scripts
● cert-request …
● Wait for email
If you have other ways
● cert-retrieve … to obtain a host cert,
feel free to use them
● cp into /etc/grid-security/
UCSD Jan 17th 2012 Condor Install 12
13. Condor
Central Manager
UCSD Jan 17th 2012 Condor Install 13
14. Refresher - Central Manager
Central manager
● Two (groups of) processes
Collector
● Collector
Negotiator
● Negotiator
● The Collector defines the Condor pool
● Knows about all the glideins it owns
● Knows about all the schedds
● The Negotiator does the matchmaking
● Decides who gets what resources
UCSD Jan 17th 2012 Condor Install 14
15. Condor Collector – considerations
● The Collector is the repository of all knowledge
● All other daemons report to it
● Including the glideins, who get its address at run-time
● Must process lots of info Central manager
● One update every 5 mins Negotiator Collector
from each and every daemon
Collector Collector
● With strong security → expensive
● Typically deployed as
a tree of collectors
glidein glidein
● All security handled in leafs
glidein
● Top one still has the complete picture glidein
UCSD Jan 17th 2012 Condor Install 15
16. CCB – An additional cost
● The Condor collectors are also acting as CCBs
● Each glidein will open 5+ long-lived TCP sockets
● Make sure you have enough file descriptors
● Default OS limit is 1024 per process
● Plan on having
one CCB per 100 glideins
CCB
Call me back
Leafs in the I want to connect
tree of collectors to the execute node
transfer files
UCSD Jan 17th 2012 Condor Install 16
17. High availability
(theory)
● Central manager can be a single point of failure
● If it dies, the Condor pool dies with it!
● To avoid this, one can deploy multiple CMs
● All daemons will advertise to 2 (or more) Collectors
Currently not supported by glideinWMS
● All CMs will have the same view of the world
● There can only be one Negotiator, though
● One negotiator will be Active, all others in standby
● More details on Condor man page
http://www.cs.wisc.edu/condor/manual/v7.6/3_11High_Availability.html#SECTION004112000000000000000
UCSD Jan 17th 2012 Condor Install 17
18. Hardware needs
● Tree of collectors spreads the load over
multiple processes
● So several CPUs come handy
● Negotiator single threaded
● Will benefit from fast CPU Exact footprint
depends on how many
● Memory usage not terrible additional attributes
the VO defines
● O(100k) per glidein to store ClassAds
● Concrete CMS example: 25k glideins ~ 6G memory
● Negligible disk IO
UCSD Jan 17th 2012 Condor Install 18
19. System considerations
Minimize risk due to Condor bugs
● Does not need to run as root (although it can)
●Make sure the host cert is readable by that user
● Must be on the public IP network
● Each collector listens on its own well defined port,
must be reachable by all glideins (WAN) Must open
firewall
● Negotiator has a dynamic list port, at least
must be reachable by submit nodes (schedds) for these
● Will use a large number of network sockets
● Will overwhelm most firewalls
● Consider disabling stateful firewalls (e.g. iptables)
UCSD Jan 17th 2012 Condor Install 19
20. Security considerations
● Cannot be firewalled → endpoint security
● GSI security used (i.e. x509 certs) for networking
● Limit administrative rights to local users (FS auth)
● The Collector is central trust point of the pool
● The DNs of all other daemons are whitelisted here,
including:
– Schedds
– Glideins (i.e. pilot proxies)
– Clients (e.g. glideinWMS Frontend)
UCSD Jan 17th 2012 Condor Install 20
21. Installing the CM
● Two major burdens (for basic install)
● Collector tree
● Security setup
● The glideinWMS installer helps with both
● Starting from Condor tarball Easy-to-use
update cmdline tool
● As any user (e.g. as non-root) available, too
● Highly recommended
● RPM install also an option
● Easy to keep up-to-date (i.e. yum update)
● But you will need to configure by hand
● And will run as root Unless you hack the startup script
UCSD Jan 17th 2012 Condor Install 21
22. Collector tree setup
● In a nutshell
● For each secondary collector:
– Tell Master to start a collector on different port
– repeat
● Forward ClassAds to main Collector
...
...
COLLECTORXXX = $(COLLECTOR)
COLLECTORXXX = $(COLLECTOR)
COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog"
COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog" xN
COLLECTORXXX_ARGS = -f -p YYYY
COLLECTORXXX_ARGS = -f -p YYYY
DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX
DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX
…
…
# forward ads to the main collector
# forward ads to the main collector
# (this is ignored by the main collector, since the address matches itself)
CONDOR_VIEW_HOSTthe main collector, since the address matches itself)
# (this is ignored by = $(COLLECTOR_HOST)
CONDOR_VIEW_HOST = $(COLLECTOR_HOST)
UCSD Jan 17th 2012 Condor Install 22
23. Security setup (1)
● In a nutshell
● Configure basic GSI (i.e. point to CAs and host cert)
● Set up authorization (i.e. switch to whitelist)
● Whitelist all DNs
● Enable GSI
● DN whitelisting a bit annoying
● Must be done in two places
– in condor_config, and
And is a regexp here!
– in condor_mapfile
● glideinWMS provides a cmdline tool
UCSD Jan 17th 2012 Condor Install 23
24. Security setup (2)
# condor_config.local
# condor_config.local
# Configure GSI
# Configure GSI
CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile
CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile
GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem
GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem
GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem
GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem
# Force whitelisting
# Force whitelisting
DENY_WRITE = anonymous@*
DENY_WRITE = anonymous@*
DENY_ADMINISTRATOR = anonymous@*
DENY_ADMINISTRATOR = anonymous@*
DENY_DAEMON = anonymous@*
DENY_DAEMON = anonymous@*
DENY_NEGOTIATOR = anonymous@*
DENY_NEGOTIATOR = anonymous@*
DENY_CLIENT = anonymous@*
DENY_CLIENT = anonymous@*
ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
ALLOW_WRITE = *
ALLOW_WRITE = *
USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN
USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN
# list all DNs # condor_mapfile
# condor_mapfile
... list all DNs
# ...
... ...
GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX
GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX
GSI "^DNXXX$" UIDXXX
GSI "^DNXXX$" UIDXXX
xN
... ...
... ...
GSI (.*) anonymous
GSI (.*) anonymous
# enable GSI FS (.*) 1
# enable GSI FS (.*) 1
SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI
SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_INTEGRITY = REQUIRED
Also enable local auth
SEC_DEFAULT_INTEGRITY = REQUIRED
# optionally, relax client and read settings
# optionally, relax client and read settings
UCSD Jan 17th 2012 Condor Install 24
25. Installing with Q&A installer
~/glideinWMS/install$ ./glideinWMS_install
~/glideinWMS/install$ ./glideinWMS_install
...
...
Please select: 4
Please select: 4
[4] User Pool Collector
... User Pool Collector
[4]
...
Where do you have the Condor tarball? /home/condor/Downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz
Where do you have the Condor tarball? /home/condor/Downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz
Where do you want to install it?: [/home/condor/glidecondor] /home/condor/glidecondor
If Where do you want to install Condor, who should get email about it?: me@myemail
something goes wrong with it?: [/home/condor/glidecondor] /home/condor/glidecondor
If something goes wrong with Condor, who should get email about it?: me@myemail
Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
...
...
Do you want to get it from VDT?: (y/n) y
Do you want to get it from VDT?: (y/n) y
Do you have already a VDT installation?: (y/n) y
Do you have already a VDT installation?: (y/n) y
Where is the VDT installed?: /etc/osg/wn-client
Where is the VDT installed?: /etc/osg/wn-client
...
...
Will you be using a proxy or a cert? (proxy/cert) cert
Will you be using a proxy or a cert? (proxy/cert) cert
Where is your certificate located?: /home/condor/.globus/hostcert.pem
Where is your certificate located?: /home/condor/.globus/hostcert.pem
Where is your certificate key located?: /home/condor/.globus/hostkey.pem
Where is your certificate key located?: /home/condor/.globus/hostkey.pem
My DN = 'DN1'
My DN = 'DN1'
... You can also add
...
DN: DNXXX
DN: DNXXX
nickname: [condor001] uidXXX the DNs as an
nickname: [condor001] uidXXX xN independent step
Is this a trusted Condor daemon?: (y/n) y
Is this a trusted Condor daemon?: (y/n) y
...
...
DN:
DN:
How many slave collectors do you want?: [5] 200
How many slave collectors do you want?: [5] 200
What name would you like to use for this pool?: [My pool] MyVO
What name would you like to use for this pool?: [My pool] MyVO
What port should the collector be running?: [9618] 9618
What port should the collector be running?: [9618] 9618
UCSD Jan 17th 2012 Condor Install 25
26. Maintenance
● If you need to add more DNs, use
● cmdline tool glidecondor_addDN
~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
Configuration files changed.
Configuration files changed.
Remember to reconfig the affected Condor daemons.
Remember to reconfig the affected Condor daemons.
● To upgrade the Condor binaries, use
● cmdline tool glidecondor_upgrade
~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
Will update Condor in /home/condor/glidecondor
Will update Condor in /home/condor/glidecondor
..
..
Creating backup dir
Creating backup dir
Putting new binaries in place
Putting new binaries in place
Finished successfully
Finished successfully
Old binaries can be found in /home/condor/glidecondor/old.120102_13
Old binaries can be found in /home/condor/glidecondor/old.120102_13
UCSD Jan 17th 2012 Condor Install 26
27. Starting Condor
● The installer will start Condor for you, but you
still should know how to stop and start it by hand
● To start condor, run:
~/glidecondor/start_condor.sh
● To stop Condor, use
condor_off -daemon master
● Finally, to force Condor to re-read the config:
~/glidecondor/sbin/condor_reconfig
UCSD Jan 17th 2012 Condor Install 27
29. Refresher - Submit node(s)
● Submit node defined by the schedd
● Which holds user jobs Submit node
Schedd
Shadow
● Shadows will be started as the .
.
.
jobs are matched to glideins Shadow
● One per running job
● At least one submit node is needed
● But there may be many
UCSD Jan 17th 2012 Condor Install 29
30. Network use
● Glideins must contact the submit node
in order to run jobs
● Both with standard protocol and CCB
● Each shadow normally uses 2 random ports
● Not firewall friendly Although firewalls can get
overwhelmed anyhow
● Can be a problem over O(10k) jobs (see CM slides)
● Newer versions of Condor support
“shared port daemon” Does not reduce
● Listens on a single port number of sockets
● Forwards the sockets to the appropriate local process
UCSD Jan 17th 2012 Condor Install 30
31. Security considerations
● Like with CM, must use endpoint security
● Schedd and CM must whitelist each other
● Certificate DN based
Central manager
● AuthZ with glideins indirect
Collector
● No need to whitelist glidein DN(s)
Negotiator
● Collector trusts glidein, Submit node
Schedd trusts Collector Schedd
● Schedd also must
whitelist any clients
(e.g. VO Frontend) Local users
● Only startds can use use FS auth glidein
(i.e. UID based)
indirect AuthZ
http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SecEnableMatchPasswordAuthentication
UCSD Jan 17th 2012 Condor Install 31
32. Hardware needs
● Submit node is memory hungry Actual need
depends on how
● 1M per running jobs due to shadows many additional
● O(10k) per job in queue for ClassAds VO attributes used
● Schedd can use a fast CPU (single threaded)
● Shadows very light CPU users
● Jobs may put substantial IO load on HDD
● Depends on how much data is being produced
● Depends how short are the jobs
● And the above is just for Condor
● VO may have portal software Make sure the remaining HW
is adequate for these
● or actual interactive users
UCSD Jan 17th 2012 Condor Install 32
33. User account considerations
● Users must be able to launch
condor_submit
locally on the submit node
● Remote submission not recommended
Still local
(and disabled by default) from the
Condor
● VO must decide how to do it point of view
● SSHd (i.e. interactive use)
● Portal (e.g. CMS CRABServer)
● Will need one UID per user No need to create
● Non-UID based auth possible, user accounts before
Installing Condor, but
but not recommended do plan for it
(but not supported out of the box)
UCSD Jan 17th 2012 Condor Install 33
34. Schedd is a superuser
● Schedd must run as root
(euid==0, even as it drops ruid to “condor”)
● So it can switch UID as needed
● To access user files
● Same for shadows (but ruid set to job user)
● Host cert thus must be owned by root
UCSD Jan 17th 2012 Condor Install 34
35. Installing the submit node
● Two major burdens (for basic install)
● Shared port daemon
● Security setup
● The glideinWMS installer helps with both
● Starting from Condor tarball Easy-to-use
● Should be run as root update cmdline tool
available, too
● Highly recommended
● RPM install also an option
● Easy to keep up-to-date (i.e. yum update)
● But you will need to configure by hand
UCSD Jan 17th 2012 Condor Install 35
36. Shared port daemon
● Not enabled by default in Condor
● In a nutshell
● Pick a port for it
● Enable it
● Add it to the list of Daemons to start
## condor_config.local
condor_config.local
## Enable shared_port_daemon
Enable shared_port_daemon
SHARED_PORT_ARGS == -p 9615
SHARED_PORT_ARGS -p 9615
USE_SHARED_PORT == True
USE_SHARED_PORT True
DAEMON_LIST == $(DAEMON_LIST) SHARED_PORT
DAEMON_LIST $(DAEMON_LIST) SHARED_PORT
UCSD Jan 17th 2012 Condor Install 36
37. Security setup (1)
● In a nutshell
● Configure basic GSI (i.e. point to CAs and host cert)
● Enable match authentication
● Set up authorization (i.e. switch to whitelist)
● Whitelist all DNs
● Enable GSI
● DN whitelisting a bit annoying
● Must be done in two places
– in condor_config, and And is a regexp here!
– in condor_mapfile
● glideinWMS provides a cmdline tool
UCSD Jan 17th 2012 Condor Install 37
38. Security setup (2)
# condor_config.local
# condor_config.local
# Configure GSI
# Configure GSI
CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile
CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile
GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem
GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem
GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem
GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem
# Enable match authentication
# Enable match authentication
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
# Force whitelisting
# Force whitelisting
DENY_WRITE = anonymous@*
DENY_WRITE = anonymous@*
… # see CM slides for details
… # see CM slides for details
# list all DNs # condor_mapfile
# list all DNs # condor_mapfile
... ...
... ...
GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX
GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI "^DNXXX$" UIDXXX
GSI "^DNXXX$" UIDXXX
xN
... ...
... ...
GSI (.*) anonymous
# enable GSI GSI (.*) anonymous
FS (.*) 1
# enable GSI
SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI FS (.*) 1
SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_INTEGRITY = REQUIRED Also enable local auth
SEC_DEFAULT_INTEGRITY = REQUIRED
# optionally, relax client and read settings
# optionally, relax client and read settings
UCSD Jan 17th 2012 Condor Install 38
39. Network optimization settings
● Since glideins often behind firewalls
● The glidein Startd setup optimized to avoid
incoming connections and UDP
● The Schedd must also play along
## condor_config.local
condor_config.local
## Reverse protocol direction
Reverse protocol direction
STARTD_SENDS_ALIVES == True
STARTD_SENDS_ALIVES True
## Avoid UDP
Avoid UDP
SCHEDD_SEND_VACATE_VIA_TCP == True
SCHEDD_SEND_VACATE_VIA_TCP True
UCSD Jan 17th 2012 Condor Install 39
40. Installing with Q&A installer
~/glideinWMS/install$ ./glideinWMS_install
~/glideinWMS/install$ ./glideinWMS_install
...
...
Please select: 5
[5] User Schedd5
Please select:
[5] User Schedd
…
…
Which user should Condor run under?: [condor] condor
Which user should Condor run under?: [condor] condor
Where do you have the Condor tarball? /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz
Where do you have the Condor tarball? /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz
Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor
Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor
If something goes wrong with Condor, who should get email about it?: me@myemail
If something goes wrong with Condor, who should get email about it?: me@myemail
Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
...
...
Do you want to get it from VDT?: (y/n) y
Do you want to get it from VDT?: (y/n) y
Do you have already a VDT installation?: (y/n) y
Do you have already a VDT installation?: (y/n) y
Where is the VDT installed?: /etc/osg/wn-client
Where is the VDT installed?: /etc/osg/wn-client
Will you be using a proxy or a cert? (proxy/cert) cert
Will you be using a proxy or a cert? (proxy/cert) cert
Where is your certificate located?: /etc/grid-security/hostcert.pem
Where is your certificate located?: /etc/grid-security/hostcert.pem
Where is your certificate key located?: /etc/grid-security/hostkey.pem
Where is your certificate key located?: /etc/grid-security/hostkey.pem
My DN = 'DN1'
My DN = 'DN1'
...
...
You can also add
DN: DNXXX the DNs as an
DN: DNXXX
nickname: [condor001] uidXXX
nickname: [condor001] uidXXX xN independent step
Is this a trusted Condor daemon?: (y/n) y
Is this a trusted Condor daemon?: (y/n) y
...
...
DN:
DN:
What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain
What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain
Do you want to enable the shared_port_daemon?: (y/n) y
Do you want to enable the shared_port_daemon?: (y/n) y
What port should it use?: [9615] 9615
What port should it use?: [9615] 9615
How many secondary schedds do you want?: [9] 0
How many secondary schedds do you want?: [9] 0
UCSD Jan 17th 2012 Condor Install 40
41. Maintenance
● If you need to add more DNs, use
● cmdline tool glidecondor_addDN
~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
Configuration files changed.
Configuration files changed.
Remember to reconfig the affected Condor daemons.
Remember to reconfig the affected Condor daemons.
Do not use
-daemon
● To upgrade the Condor binaries, use for client's DN
● cmdline tool glidecondor_upgrade
~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
Will update Condor in /home/condor/glidecondor
Will update Condor in /home/condor/glidecondor
..
..
Creating backup dir
Creating backup dir
Putting new binaries in place
Putting new binaries in place
Finished successfully
Finished successfully
Old binaries can be found in /home/condor/glidecondor/old.120102_13
Old binaries can be found in /home/condor/glidecondor/old.120102_13
UCSD Jan 17th 2012 Condor Install 41
42. Starting Condor
● The installer will start Condor for you, but you still
should know how to stop and start it by hand
● The installer has created an init.d script for you
/etc/init.d/condor start|stop
● To force Condor to reload its config, still use
/opt/glidecondor/sbin/condor_reconfig
All as root
UCSD Jan 17th 2012 Condor Install 42
44. Fine tunning
● The previous slides provide only basic setup
● Although the glideinWMS does some basic tunning
● You will likely want to tune the system further
● Proper limits in the submit node
● Default job attributes
● Sanity checks
● Priority tunning
● Not part of this talk
● Will go into details tomorrow
UCSD Jan 17th 2012 Condor Install 44
45. Integration with
OSG Accounting
UCSD Jan 17th 2012 Condor Install 45
46. OSG Accounting
● OSG tries to keep accurate accounting
information of who used what resources
● Using GRATIA
https://twiki.grid.iu.edu/twiki/bin/view/Accounting/WebHome
http://gratia-osg-prod-reports.opensciencegrid.org/gratia-reporting/
UCSD Jan 17th 2012 Condor Install 46
47. Per-user accounting
● OSG has per-user accounting, too
● With glideins, this level of detail lost
● Only pilot proxy seen by OSG (sites)
UCSD Jan 17th 2012 Condor Install 47
48. The glidein GRATIA probe
● OSG thus asks glidein operators to install a
dedicated probe alongside the glidein schedd(s)
● Which will provide per-user accounting info
to the OSG GRATIA server
● Optimized for use with OSG glidein factory
https://twiki.grid.iu.edu/bin/view/Accounting/ProbeConfigGlideinWMS
Submit node
Schedd OSG
GRATIA
GRATIA Probe
Server
UCSD Jan 17th 2012 Condor Install 48
49. Installing the GRATIA probe
● In a nutshell
● Register submit node with GOC
● Tweak condor config
● yum install gratia-probe-condor
● Configure GRATIA
https://twiki.grid.iu.edu/bin/view/Accounting/ProbeConfigGlideinWMS
UCSD Jan 17th 2012 Condor Install 49
50. Condor changes for GRATIA
● GRATIA gets information from history logs
● Requires one file per terminated job for efficiency
● GRATIA needs to know where the job ran
● Additional attribute added to the job ClassAd
(more general details on this tomorrow)
## condor_config.local
condor_config.local
PER_JOB_HISTORY_DIR ==/var/lib/gratia/data
PER_JOB_HISTORY_DIR /var/lib/gratia/data
JOBGLIDEIN_ResourceName=
JOBGLIDEIN_ResourceName=
"$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName),
"$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName),
IfThenElse(IsUndefined(TARGET.GLIDEIN_Site),
IfThenElse(IsUndefined(TARGET.GLIDEIN_Site),
FileSystemDomain, TARGET.GLIDEIN_Site),
FileSystemDomain, TARGET.GLIDEIN_Site),
TARGET.GLIDEIN_ResourceName)])"
TARGET.GLIDEIN_ResourceName)])"
SUBMIT_EXPRS == $(SUBMIT_EXPRS) JOBGLIDEIN_ResourceName
SUBMIT_EXPRS $(SUBMIT_EXPRS) JOBGLIDEIN_ResourceName
UCSD Jan 17th 2012 Condor Install 50
51. GRATIA configuration
● Essentially just tell GRATIA ## /etc/gratia/condor/ProbeConfig
/etc/gratia/condor/ProbeConfig
what name you have SiteName="VOX_glidein_node1"
SiteName="VOX_glidein_node1"
EnableProbe="1"
registered in with GOC EnableProbe="1"
## add this line to allow user jobs
add this line to allow user jobs
● Then enable it ## without a proxy
without a proxy
MapUnknownToGroup="1"
MapUnknownToGroup="1"
● You also need to tell it ## /root/setup.sh
/root/setup.sh
where to find Condor source /etc/profile.d/condor.sh
source /etc/profile.d/condor.sh
UCSD Jan 17th 2012 Condor Install 51
53. Pointers
● The official glideinWMS project Web page is
http://tinyurl.com/glideinWMS
● glideinWMS development team is reachable at
glideinwms-support@fnal.gov
● Condor Home Page
http://www.cs.wisc.edu/condor/
● Condor support
condor-user@cs.wisc.edu
condor-admin@cs.wisc.edu
UCSD Jan 17th 2012 Condor Install 53
54. Acknowledgments
● The glideinWMS is a CMS-led project
developed mostly at FNAL, with contributions
from UCSD and ISI
● The glideinWMS factory operations at UCSD is
sponsored by OSG
● The funding comes from NSF, DOE and the
UC system
UCSD Jan 17th 2012 Condor Install 54