glideinWMS Frontend Internals - glideinWMS Training Jan 2012Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS Frontend. Part of the glideinWMS Training session held in Jan 2012 at UCSD.
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS glidein startup script and the glideins in general . Part of the glideinWMS Training session held in Jan 2012 at UCSD.
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012Igor Sfiligoi
This talk walks you through the monitoring options a glideinWMS Frontend operator has.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.
An argument for moving the requirements out of user hands - The CMS ExperienceIgor Sfiligoi
This talks makes an argument why users should not write arbitrary Requirements expressions in Condor, but only express the desires, and let someone else write the actual policy.
Presentation at Condor Week 2012.
This document summarizes a presentation about advanced effects in Java desktop applications using the Swing toolkit. The presentation discusses how to hook into the Swing painting pipeline using components like the RepaintManager and glass pane. It demonstrates how to play with opacity and layering to create transition effects. The Rainbow demo is shown as an example of using these techniques to add visual effects like translucency and ghosting.
glideinWMS Frontend Internals - glideinWMS Training Jan 2012Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS Frontend. Part of the glideinWMS Training session held in Jan 2012 at UCSD.
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Igor Sfiligoi
This presentation provides a detailed insight on the internal working of the glideinWMS glidein startup script and the glideins in general . Part of the glideinWMS Training session held in Jan 2012 at UCSD.
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012Igor Sfiligoi
This talk walks you through the monitoring options a glideinWMS Frontend operator has.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.
An argument for moving the requirements out of user hands - The CMS ExperienceIgor Sfiligoi
This talks makes an argument why users should not write arbitrary Requirements expressions in Condor, but only express the desires, and let someone else write the actual policy.
Presentation at Condor Week 2012.
This document summarizes a presentation about advanced effects in Java desktop applications using the Swing toolkit. The presentation discusses how to hook into the Swing painting pipeline using components like the RepaintManager and glass pane. It demonstrates how to play with opacity and layering to create transition effects. The Rainbow demo is shown as an example of using these techniques to add visual effects like translucency and ghosting.
El documento describe un circuito de pádel femenino organizado en cuatro ciudades españolas entre junio y septiembre. Participarán cerca de 3,000 mujeres entre 20 y 45 años. Cada evento se llevará a cabo en un club de pádel y contará con zonas recreativas y clínicas de pádel. La organización busca patrocinadores para promocionar sus marcas a través de equipamiento, publicidad y activaciones en los eventos.
La carga marítima puede ser clasificada como carga general, carga a granel o carga especial. La carga general incluye carga con embalaje como cajas y tambores, carga sin embalaje como planchas de hierro, y carga unitarizada, paletizada o preeslingada. La carga a granel no requiere embalaje y puede ser sólida, líquida o gaseosa. La carga especial requiere un trato cuidadoso debido a su peso, valor, peligrosidad o perecedero.
This document provides information about the ASRock G41M-VS2 motherboard, including its specifications and included contents. The motherboard supports Intel Core 2 and Pentium Dual-Core CPUs and features dual-channel DDR2 memory, PCI Express graphics, and integrated 5.1 channel audio. Included in the package are the motherboard, installation guide, support CD, and optional cables.
El documento describe un fondo de capital privado que busca desarrollar un terminal logístico en Cartagena, Colombia. El fondo invertirá $160.5 mil millones en el proyecto, que incluirá 77,322 metros cuadrados de bodegas y 25,000 metros cuadrados de locales comerciales, oficinas y un hotel. El fondo espera generar una tasa interna de retorno del 16.5% a través de arriendos y la venta de los activos al final del plazo. El terminal logístico aprovechará la ubicación
This document discusses generating reader interaction for the "Stepping Stone" publication. It introduces a new segment called "What Would You Do?" which poses management or business decision scenarios. Readers are invited to submit their responses to the first scenario about whether to demote or replace the manager of a compliance unit. Future issues will compile anonymous reader responses alongside the real-life outcomes of the situations. Readers are also encouraged to suggest their own scenarios. The first scenario posed is about whether to demote, fire or find another option for the manager of a compliance unit who is underperforming but has valuable institutional knowledge. Readers are asked to submit their responses to the editor.
El documento describe las novedades del producto IRQA, incluyendo el lanzamiento de una nueva versión web para facilitar el acceso y uso de los requisitos para clientes y proveedores. La nueva versión web proporciona funcionalidades como gestión de requisitos, trazabilidad, edición avanzada, y control de acceso. También se detallan las mejoras en priorización de requisitos, permitiendo valorar factores como el valor, coste y riesgo para ordenar el desarrollo.
Este documento describe el algoritmo de Bresenham para dibujar líneas y círculos de forma eficiente en una pantalla de ordenador. Explica la lógica matemática detrás del algoritmo y proporciona ejemplos de código en C++ para implementarlo. También incluye una bibliografía de fuentes adicionales sobre el tema.
Propuesta de jabones artesanales para Navidad
La línea de los productos
naturales Buomarino de la
artesana cosmética Vanessa
Urbina, se renueva anualmente en su presentación, con
fragancias nuevas y exquisitas, manteniendo siempre el buen
gusto y estilo.
Precios sujetos a cambio sin previo aviso
Enfermedad hepática grasa no alcohólica EHNALen Mrl
1. El documento trata sobre la esteatohepatitis no alcohólica (EHNA), que abarca desde la esteatosis hepática hasta la fibrosis y cirrosis. 2. La EHNA se define como la acumulación de grasa en más del 5-10% de los hepatocitos y sus factores de riesgo primarios incluyen la obesidad, diabetes tipo 2 e hiperlipidemia. 3. La patogenia involucra la captación hepática aumentada de ácidos grasos, estrés oxidativo y respuesta inflamatoria que pueden conducir a fibrosis.
Este documento presenta una introducción al marketing móvil, incluyendo la evolución de los teléfonos móviles y smartphones, así como datos sobre la inversión publicitaria en marketing móvil y diferentes acciones como publicidad móvil, cuponing, servicios basados en localización, mensajería y contenidos y aplicaciones. También cubre temas como la web móvil y las aplicaciones.
This document discusses IVF treatment for polycystic ovary syndrome (PCOS). It begins with an overview of PCOS prevalence, definitions, and diagnostic criteria. IVF is indicated for PCOS patients who fail to conceive after ovulation induction or have other fertility factors. Patient preparation, gonadotropin protocols and monitoring, triggering ovulation, embryo transfer, and luteal phase support are discussed. Outcomes are better with GnRH antagonist protocols for PCOS patients due to lower gonadotropin doses and risk of ovarian hyperstimulation syndrome (OHSS). Primary and secondary prevention of OHSS includes metformin use, coasting, cryopreservation of embryos, and GnRH agonist triggering of ovulation.
Rudy Martin, VP of Operations at Trustpilot, discusses how Trustpilot has evolved its infrastructure on AWS over time. Originally, Trustpilot had overprovisioned EC2 instances in monolithic stacks with minimal automation. It has since moved to a microservices architecture using services like ECS, Lambda, ELB, RDS, Redshift and benefits from AWS's auto-scaling capabilities. Future plans include multi-region deployment to Europe, further embracing serverless technologies, and decoupling databases. Media storms are also discussed along with lessons around ensuring production infrastructure can auto-scale, simplifying stacks, and innovating.
This document provides instructions for starting an Adsense arbitrage business using traffic from Facebook ads. It recommends investing $100-150 initially for domain registration, hosting, and a WordPress theme. Key steps include researching popular viral content sites to identify high-performing content formats, writing at least 10 articles by rewriting existing content, and promoting articles on Facebook with $5 ads to drive traffic. Proper tracking with Google Analytics and monetizing content through Adsense, CPA offers, or retargeting are also covered. The goal is to get traffic cheaply from Facebook and resell it to Google advertisers through content ads and affiliate offers.
Este documento presenta un resumen de 10 capítulos sobre la teología de la liturgia en el siglo XX. Cubre temas como el movimiento litúrgico, las diferentes tradiciones litúrgicas occidentales y orientales, la estructura de la misa y otros sacramentos, y el significado teológico del año litúrgico. El autor analiza la historia y desarrollo de la liturgia desde una perspectiva trinitaria y cristocéntrica.
El documento define varios términos relacionados con la arquitectura y la escultura barroca, como el baldaquino, el camarín, el candelero y la carnación. Explica sus características y proporciona ejemplos notables de cada uno.
Este documento presenta los fundamentos de la prevención de riesgos laborales en España. Explica que la Constitución y la legislación internacional establecen el derecho a un trabajo seguro y saludable. La ley más importante es la Ley de Prevención de Riesgos Laborales de 1995, que regula las obligaciones de las empresas, trabajadores y administraciones para proteger la salud de los trabajadores. El objetivo es prevenir accidentes y enfermedades ocupacionales mediante una política de prevención eficaz e integrada en la gestión empresarial
Este documento proporciona una lista de 333 pares biomagnéticos ordenados alfabéticamente. Cada par describe un órgano o parte del cuerpo asociado con un patógeno específico y sus características generales.
Impulsando el potencial y talento de nuestros/as hijos/as.
Una propuesta dirigida a desarrollar una maternidad y paternidad consciente, en la que los padres y madres tienen la oportunidad de adquirir nuevos recursos
This document provides an overview of glidein internals, including how glideins work, what glidein_startup does to configure and start Condor, security considerations for multi-user pilot jobs, and how glexec can help address security issues. It describes the key tasks of glidein_startup such as downloading files, validating nodes, configuring Condor, starting Condor daemons, collecting monitoring info, and cleaning up. It also discusses limiting glidein lifetime and addressing sources of waste.
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
Descripton of how to write custom validation scripts in glideinWMS, with an emphasis on the VO Frontend operations.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
This document provides an introduction to glideinWMS for users with experience in grid computing. It explains that glideinWMS addresses the problems of scheduling many user jobs across multiple grid sites in a fair manner. It does this by using "pilot jobs" that create an "overlay batch system" where user jobs can run. This allows flexible job scheduling policies. The document provides high-level overviews of how glideinWMS interfaces with grid sites, the glidein factory, VO frontend, and user experience.
El documento describe un circuito de pádel femenino organizado en cuatro ciudades españolas entre junio y septiembre. Participarán cerca de 3,000 mujeres entre 20 y 45 años. Cada evento se llevará a cabo en un club de pádel y contará con zonas recreativas y clínicas de pádel. La organización busca patrocinadores para promocionar sus marcas a través de equipamiento, publicidad y activaciones en los eventos.
La carga marítima puede ser clasificada como carga general, carga a granel o carga especial. La carga general incluye carga con embalaje como cajas y tambores, carga sin embalaje como planchas de hierro, y carga unitarizada, paletizada o preeslingada. La carga a granel no requiere embalaje y puede ser sólida, líquida o gaseosa. La carga especial requiere un trato cuidadoso debido a su peso, valor, peligrosidad o perecedero.
This document provides information about the ASRock G41M-VS2 motherboard, including its specifications and included contents. The motherboard supports Intel Core 2 and Pentium Dual-Core CPUs and features dual-channel DDR2 memory, PCI Express graphics, and integrated 5.1 channel audio. Included in the package are the motherboard, installation guide, support CD, and optional cables.
El documento describe un fondo de capital privado que busca desarrollar un terminal logístico en Cartagena, Colombia. El fondo invertirá $160.5 mil millones en el proyecto, que incluirá 77,322 metros cuadrados de bodegas y 25,000 metros cuadrados de locales comerciales, oficinas y un hotel. El fondo espera generar una tasa interna de retorno del 16.5% a través de arriendos y la venta de los activos al final del plazo. El terminal logístico aprovechará la ubicación
This document discusses generating reader interaction for the "Stepping Stone" publication. It introduces a new segment called "What Would You Do?" which poses management or business decision scenarios. Readers are invited to submit their responses to the first scenario about whether to demote or replace the manager of a compliance unit. Future issues will compile anonymous reader responses alongside the real-life outcomes of the situations. Readers are also encouraged to suggest their own scenarios. The first scenario posed is about whether to demote, fire or find another option for the manager of a compliance unit who is underperforming but has valuable institutional knowledge. Readers are asked to submit their responses to the editor.
El documento describe las novedades del producto IRQA, incluyendo el lanzamiento de una nueva versión web para facilitar el acceso y uso de los requisitos para clientes y proveedores. La nueva versión web proporciona funcionalidades como gestión de requisitos, trazabilidad, edición avanzada, y control de acceso. También se detallan las mejoras en priorización de requisitos, permitiendo valorar factores como el valor, coste y riesgo para ordenar el desarrollo.
Este documento describe el algoritmo de Bresenham para dibujar líneas y círculos de forma eficiente en una pantalla de ordenador. Explica la lógica matemática detrás del algoritmo y proporciona ejemplos de código en C++ para implementarlo. También incluye una bibliografía de fuentes adicionales sobre el tema.
Propuesta de jabones artesanales para Navidad
La línea de los productos
naturales Buomarino de la
artesana cosmética Vanessa
Urbina, se renueva anualmente en su presentación, con
fragancias nuevas y exquisitas, manteniendo siempre el buen
gusto y estilo.
Precios sujetos a cambio sin previo aviso
Enfermedad hepática grasa no alcohólica EHNALen Mrl
1. El documento trata sobre la esteatohepatitis no alcohólica (EHNA), que abarca desde la esteatosis hepática hasta la fibrosis y cirrosis. 2. La EHNA se define como la acumulación de grasa en más del 5-10% de los hepatocitos y sus factores de riesgo primarios incluyen la obesidad, diabetes tipo 2 e hiperlipidemia. 3. La patogenia involucra la captación hepática aumentada de ácidos grasos, estrés oxidativo y respuesta inflamatoria que pueden conducir a fibrosis.
Este documento presenta una introducción al marketing móvil, incluyendo la evolución de los teléfonos móviles y smartphones, así como datos sobre la inversión publicitaria en marketing móvil y diferentes acciones como publicidad móvil, cuponing, servicios basados en localización, mensajería y contenidos y aplicaciones. También cubre temas como la web móvil y las aplicaciones.
This document discusses IVF treatment for polycystic ovary syndrome (PCOS). It begins with an overview of PCOS prevalence, definitions, and diagnostic criteria. IVF is indicated for PCOS patients who fail to conceive after ovulation induction or have other fertility factors. Patient preparation, gonadotropin protocols and monitoring, triggering ovulation, embryo transfer, and luteal phase support are discussed. Outcomes are better with GnRH antagonist protocols for PCOS patients due to lower gonadotropin doses and risk of ovarian hyperstimulation syndrome (OHSS). Primary and secondary prevention of OHSS includes metformin use, coasting, cryopreservation of embryos, and GnRH agonist triggering of ovulation.
Rudy Martin, VP of Operations at Trustpilot, discusses how Trustpilot has evolved its infrastructure on AWS over time. Originally, Trustpilot had overprovisioned EC2 instances in monolithic stacks with minimal automation. It has since moved to a microservices architecture using services like ECS, Lambda, ELB, RDS, Redshift and benefits from AWS's auto-scaling capabilities. Future plans include multi-region deployment to Europe, further embracing serverless technologies, and decoupling databases. Media storms are also discussed along with lessons around ensuring production infrastructure can auto-scale, simplifying stacks, and innovating.
This document provides instructions for starting an Adsense arbitrage business using traffic from Facebook ads. It recommends investing $100-150 initially for domain registration, hosting, and a WordPress theme. Key steps include researching popular viral content sites to identify high-performing content formats, writing at least 10 articles by rewriting existing content, and promoting articles on Facebook with $5 ads to drive traffic. Proper tracking with Google Analytics and monetizing content through Adsense, CPA offers, or retargeting are also covered. The goal is to get traffic cheaply from Facebook and resell it to Google advertisers through content ads and affiliate offers.
Este documento presenta un resumen de 10 capítulos sobre la teología de la liturgia en el siglo XX. Cubre temas como el movimiento litúrgico, las diferentes tradiciones litúrgicas occidentales y orientales, la estructura de la misa y otros sacramentos, y el significado teológico del año litúrgico. El autor analiza la historia y desarrollo de la liturgia desde una perspectiva trinitaria y cristocéntrica.
El documento define varios términos relacionados con la arquitectura y la escultura barroca, como el baldaquino, el camarín, el candelero y la carnación. Explica sus características y proporciona ejemplos notables de cada uno.
Este documento presenta los fundamentos de la prevención de riesgos laborales en España. Explica que la Constitución y la legislación internacional establecen el derecho a un trabajo seguro y saludable. La ley más importante es la Ley de Prevención de Riesgos Laborales de 1995, que regula las obligaciones de las empresas, trabajadores y administraciones para proteger la salud de los trabajadores. El objetivo es prevenir accidentes y enfermedades ocupacionales mediante una política de prevención eficaz e integrada en la gestión empresarial
Este documento proporciona una lista de 333 pares biomagnéticos ordenados alfabéticamente. Cada par describe un órgano o parte del cuerpo asociado con un patógeno específico y sus características generales.
Impulsando el potencial y talento de nuestros/as hijos/as.
Una propuesta dirigida a desarrollar una maternidad y paternidad consciente, en la que los padres y madres tienen la oportunidad de adquirir nuevos recursos
This document provides an overview of glidein internals, including how glideins work, what glidein_startup does to configure and start Condor, security considerations for multi-user pilot jobs, and how glexec can help address security issues. It describes the key tasks of glidein_startup such as downloading files, validating nodes, configuring Condor, starting Condor daemons, collecting monitoring info, and cleaning up. It also discusses limiting glidein lifetime and addressing sources of waste.
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
Descripton of how to write custom validation scripts in glideinWMS, with an emphasis on the VO Frontend operations.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.
This document provides an introduction to glideinWMS for users with experience in grid computing. It explains that glideinWMS addresses the problems of scheduling many user jobs across multiple grid sites in a fair manner. It does this by using "pilot jobs" that create an "overlay batch system" where user jobs can run. This allows flexible job scheduling policies. The document provides high-level overviews of how glideinWMS interfaces with grid sites, the glidein factory, VO frontend, and user experience.
Pilot Factory uses schedd glideins to submit pilot jobs locally from a remote resource. A pilot generator program communicates with a database and periodically submits pilots with desired configurations to matchmake and run various job types. This allows bypassing Condor-G and its overhead for large job submissions while taking advantage of local scheduling on the remote resource. Future work includes integrating pilots more directly with Condor startds for additional functionality.
Monitoring and troubleshooting a glideinWMS-based HTCondor poolIgor Sfiligoi
This document discusses tools for monitoring and troubleshooting jobs in a glideinWMS-based HTCondor pool. It describes how to determine where jobs are running, why jobs may not be starting, and why jobs are taking a long time to finish. The key tools mentioned are condor_q, condor_history, and the job event log. The document also provides guidance on checking job requirements, user priorities, supported sites, and restarting jobs.
Solving Grid problems through glidein monitoringIgor Sfiligoi
This document provides an overview of common problems that can occur in a grid and how they are diagnosed and addressed through glidein monitoring. It discusses issues that may happen at various points such as compute elements refusing glideins, validation failures on worker nodes, authentication problems, and job startup failures due to issues like gLExec configuration. The document aims to help understand the debugging process for grid problems and how glidein monitoring plays a key role in solving grid issues.
Wedding convenience and control with RemoteCondorIgor Sfiligoi
This presentation explains why Condor is not suitable for use on user-owned machines, and why RemoteCondor is the best available solution to the problem.
The glideinWMS approach to the ownership of System Images in the Cloud WorldIgor Sfiligoi
Presentation at CLOSER 2012.
Scientific communities that are accustomed to use Grid resources are now considering the use of Cloud resources. However, moving from the Grid to the Cloud brings along the need for the creation and maintenance of the system image used to configure the provisioned resources, and this presents both opportunities and problems for the users. The impact is especially interesting in the context of glideinWMS due to its layered architecture. This presentation describes the various options available to the glideinWMS project team, their advantages and disadvantages, and explains why one of them is to be preferred.
Closer web page: http://closer.scitevents.org/
The document discusses 2D viewing and simple animation techniques in computer graphics, including how to define a viewing region, perform viewing transformations, construct basic animations using techniques like double buffering and periodic motion, and manage frame rates for smooth animation playback. It also provides OpenGL code examples for tasks like setting the viewport and scaling images.
Similar to glideinWMS Architecture - glideinWMS Training Jan 2012 (10)
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
The document discusses the CGYRO simulation tool, which is used for fusion plasma turbulence simulations. CGYRO is optimized for multi-scale simulations and is both memory and compute intensive. It is inherently parallel and uses OpenMP, OpenACC, and MPI for parallelization across CPU and GPU cores. While initial runs on Perlmutter had communication bottlenecks, improved networking with Slingshot 11 has helped increase performance, though it can interfere with MPS. Overall, CGYRO users are pleased with the transition from Cori to Perlmutter, finding it much faster for equivalent hardware.
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now allowing for some problems that once required a multi-node setup to be also solvable on a single node. When possible, the increased interconnect bandwidth can result in order of magnitude higher science throughput, especially for communication-heavy applications. In this paper we analyze the performance of the fusion simulation tool CGYRO, an Eulerian gyrokinetic turbulence solver designed and optimized for collisional, electromagnetic, multiscale simulation, which is widely used in the fusion research community. Due to the nature of the problem, the application has to work on a large multi-dimensional computational mesh as a whole, requiring frequent exchange of large amounts of data between the compute processes. In particular, we show that the average-scale nl03 benchmark CGYRO simulation can be run at an acceptable speed on a single Google Cloud instance with 16 A100 GPUs, outperforming 8 NERSC Perlmutter Phase1 nodes, 16 ORNL Summit nodes and 256 NERSC Cori nodes. Moving from a multi-node to a single-node GPU setup we get comparable simulation times using less than half the number of GPUs. Larger benchmark problems, however, still require a multi-node HPC setup due to GPU memory capacity needs, since at the time of writing no vendor offers nodes with a sufficient GPU memory setup. The upcoming external NVSWITCH does however promise to deliver an almost equivalent solution for up to 256 NVIDIA GPUs.
Presented at PEARC22.
Paper DOI: https://doi.org/10.1145/3491418.3535130
The anachronism of whole-GPU accountingIgor Sfiligoi
NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover, for applications that require significant CPU-based compute to complement the GPU-based compute, it is becoming harder and harder to make full use of the newer GPUs, requiring sharing of those GPUs between multiple applications in order to maximize the achievable science output. This further reduces the value of whole-GPU accounting, especially when the sharing is done at the infrastructure level. We thus argue that GPU accounting for throughput-oriented infrastructures should be expressed in GPU core hours, much like it is normally done for the CPUs. While GPU core compute throughput does change between GPU generations, the variability is similar to what we expect to see among CPU cores. To validate our position, we present an extensive set of run time measurements of two IceCube photon propagation workflows on 14 GPU models, using both on-prem and Cloud resources. The measurements also outline the influence of GPU sharing at both HTCondor and Kubernetes infrastructure level.
Presented at PEARC22.
Document DOI: https://doi.org/10.1145/3491418.3535125
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, but it has very limited native support for autonomously provisioning resources managed by other solutions. This work presents a solution that allows for autonomous, demand-driven provisioning of Kubernetes-managed resources. A high-level overview of the employed architectures is presented, paired with the description of the setups used in both on-prem and Cloud deployments in support of several Open Science Grid communities. The experience suggests that the described solution should be generally suitable for contributing Kubernetes-based resources to existing HTCondor pools.
Presented at PEARC22.
Paper DOI: https://doi.org/10.1145/3491418.3535123
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
Overview of the recent performance optimization of CGYRO, an Eulerian GyroKinetic Fusion Plasma solver, with emphasize on the Multiscale Turbulence Simulations.
Presented at the joint US-Japan Workshop on Exascale Computing Collaboration and6th workshop of US-Japan Joint Institute for Fusion Theory (JIFT) program (Jan 18th 2022).
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
Poster presented at PEAC21.
The poster contains the complete scaling plots for both unweighted and weighted normalized Unifrac compute for sample sizes ranging from 1k to 307k on both GPUs and CPUs.
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
Presented at PEARC21.
Many scientific high-throughput applications can benefit from the elastic nature of Cloud resources, especially when there is a need to reduce time to completion. Cost considerations are usually a major issue in such endeavors, with networking often a major component; for data-intensive applications, egress networking costs can exceed the compute costs. Dedicated network links provide a way to lower the networking costs, but they do add complexity. In this paper we provide a description of a 100 fp32 PFLOPS Cloud burst in support of IceCube production compute, that used Internet2 Cloud Connect service to provision several logically-dedicated network links from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and Google Cloud Platform, that in aggregate enabled approximately 100 Gbps egress capability to on-prem storage. It provides technical details about the provisioning process, the benefits and limitations of such a setup and an analysis of the costs incurred.
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
Presented at PEARC21.
Most experimental sciences now rely on computing, and biolog- ical sciences are no exception. As datasets get bigger, so do the computing costs, making proper optimization of the codes used by scientists increasingly important. Many of the codes developed in recent years are based on the Python-based NumPy, due to its ease of use and good performance characteristics. The composable nature of NumPy, however, does not generally play well with the multi-tier nature of modern CPUs, making any non-trivial multi- step algorithm limited by the external memory access speeds, which are hundreds of times slower than the CPU’s compute capabilities. In order to fully utilize the CPU compute capabilities, one must keep the working memory footprint small enough to fit in the CPU caches, which requires splitting the problem into smaller portions and fusing together as many steps as possible. In this paper, we present changes based on these principles to two important func- tions in the scikit-bio library, principal coordinates analysis and the Mantel test, that resulted in over 100x speed improvement in these widely used, general-purpose tools.
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
The document discusses how Nvidia's A100 GPU with Multi-Instance GPU (MIG) capability can help scale up scientific output for astronomy projects like IceCube and LIGO. The A100 is much faster than previous GPUs, but MIG allows it to be partitioned so multiple jobs or processes can leverage the GPU simultaneously. This results in 200-600% higher throughput compared to using a single GPU, by better utilizing the massive parallelism of the A100. MIG makes the powerful A100 GPU practical for these CPU-bound scientific workloads.
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
Presented at EDUCAUSE CCCG March 2021.
The IceCube Neutrino Observatory is the world’s premier facility to detect neutrinos.
Built at the south pole in natural ice, it requires extensive and expensive calibration to properly track the neutrinos.
Most of the required compute power comes from on-prem resources through the Open Science Grid,
but IceCube can easily harness the Cloud compute at any scale, too, as demonstrated by a series of Cloud bursts.
This talk provides both details of the performed Cloud bursts, as well as some insight in the science itself.
Fusion simulations have traditionally required the use of leadership scale HPC resources in order to produce advances in physics. One such package is CGYRO, a premier tool for multi-scale plasma turbulence simulation. CGYRO is a typical HPC application that will not fit into a single node, as it requires several TeraBytes of memory and O(100) TFLOPS compute capability for cutting-edge simulations. CGYRO also requires high-throughput and low-latency networking, due to its reliance on global FFT computations. While in the past such compute may have required hundreds, or even thousands of nodes, recent advances in hardware capabilities allow for just tens of nodes to deliver the necessary compute power. We explored the feasibility of running CGYRO on Cloud resources provided by Microsoft on their Azure platform, using the infiniband-connected HPC resources in spot mode. We observed both that CPU-only resources were very efficient, and that running in spot mode was doable, with minimal side effects. The GPU-enabled resources were less cost effective but allowed for higher scaling.
This document discusses a large-scale GPU-based cloud burst simulation run by the IceCube collaboration to calibrate simulations of natural ice. The simulation was data-intensive, producing over 130 TB of data and exceeding 10 Gbps of egress bandwidth. Internet2 Cloud Connect service was used to provision over 20 dedicated network links between collaborators' institutions and cloud providers to enable high-throughput data transfer at a lower cost than commercial routes. Careful planning was required to smoothly ramp up the burst and avoid overloading individual network links.
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
This document discusses using Admiralty to federate the Pacific Research Platform (PRP) Kubernetes cluster, called Nautilus, with other clusters. The key points are:
1) PRP/Nautilus has been growing and now has nodes in multiple regions, requiring federation to integrate resources.
2) Admiralty provides a native Kubernetes solution for federation without centralized control. It allows clusters to participate in multiple federations.
3) Installing Admiralty on PRP/Nautilus and other clusters being federated was straightforward using Helm. Pods can be scheduled across clusters automatically.
4) Initial federation is working well between PRP/Nautilus and other clusters for expanded resource sharing
Accelerating microbiome research with OpenACCIgor Sfiligoi
Presented at OpenACC Summit 2020.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another. Computing UniFrac on modest sample sizes used to take a workday on a server class CPU-only node, while modern datasets would require a large compute cluster to be feasible. After porting to GPUs using OpenACC, the compute of the same modest sample size now takes only a few minutes on a single NVIDIA V100 GPU, while modern datasets can be processed on a single GPU in hours. The OpenACC programming model made the porting of the code to GPUs extremely simple; the first prototype was completed in just over a day. Getting full performance did however take much longer, since proper memory access is fundamental for this application.
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
Poster presented at PEARC20.
UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another (“beta diversity”). The recently implemented Striped UniFrac added the capability to split the problem into many independent subproblems and exhibits near linear scaling. In this poster we describe steps undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset containing 113k samples reduced the run time from over one month on the CPU to less than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in precision). This was achieved by using OpenACC for generating the GPU offload code and by improving the memory access patterns. A BSD-licensed implementation is available, which produces a Cshared library linkable by any programming language.
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
Poster presented at PEARC20.
There is increased awareness and recognition that public Cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver. The value of elastic scaling is however tightly coupled to the capabilities of the networks that connect all involved resources, both in the public Clouds and at the various research institutions. This poster presents results of measurements involving file transfers inside public Cloud providers, fetching data from on-prem resources into public Cloud instances and fetching data from public Cloud storage into on-prem nodes. The networking of the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform, has been benchmarked. The on-prem nodes were managed by either the Pacific Research Platform or located at the University of Wisconsin – Madison. The observed sustained throughput was of the order of 100 Gbps in all the tests moving data in and out of the public Clouds and throughput reaching into the Tbps range for data movements inside the public Cloud providers themselves. All the tests used HTTP as the transfer protocol.
TransAtlantic Networking using Cloud linksIgor Sfiligoi
Scientific communities have only limited amount of bandwidth available for transferring data between the US and the EU.
We know Cloud providers have plenty of bandwidth available, but at what cost?
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
20240605 QFM017 Machine Intelligence Reading List May 2024
glideinWMS Architecture - glideinWMS Training Jan 2012
1. glideinWMS training @ UCSD
glideinWMS architecture
by Igor Sfiligoi (UCSD)
UCSD Jan 17th 2012 glideinWMS architecture 1
2. Outline
● A high level overview
of the glideinWMS
● Description of the
components
UCSD Jan 17th 2012 glideinWMS architecture 2
3. glideinWMS
glideinWMS
from 10k feet
UCSD Jan 17th 2012 glideinWMS architecture 3
4. Refresher - Condor
● A Condor pool is composed of 3 pieces
Central manager
Execution node
Collector
Execution node
Negotiator
Submit node
Execution node
Submit node
Execution node
Submit node
Execution node
Schedd Startd
Job
UCSD Jan 17th 2012 glideinWMS architecture 4
5. What is a glidein?
● A glidein is just a properly configured
execution node submitted as a Grid job
Central manager
glidein
Execution node
Collector
glidein
Execution node
Negotiator
Submit node
Submit node
glidein
Execution node
Submit node
Execution node
glidein
Schedd Startd
Job
UCSD Jan 17th 2012 glideinWMS architecture 5
6. What is glideinWMS?
● glideinWMS is an automated tool for submitting
glideins on demand
Central manager
glidein
Execution node
Collector CREAM
glidein
Execution node
Negotiator
Submit node
Submit node
glidein
Execution node
Submit node
Execution node
glidein
Schedd Startd
Globus
Job
glideinWMS
UCSD Jan 17th 2012 glideinWMS architecture 6
8. glideinWMS architecture
● glideinWMS has 3 logical pieces
● glidein_startup – Configures and starts
Condor execution daemons
Runtime environment
discovery and validation
● Factory – Knows about the sites and
does the submission Grid knowledge and
troubleshooting
● Frontend – Knows about user jobs and
requests glideins
Site selection logic
and job monitoring
UCSD Jan 17th 2012 glideinWMS architecture 8
9. Cardinality
● N-to-M relationship
● Each Frontend can talk to many Factories
● Each Factory may serve many Frontends
VO Frontend
VO Frontend Glidein Factory Collector
Schedd
Negotiator
Collector Startd
Startd
Schedd
User job User job
Negotiator
Startd
Glidein Factory User job
UCSD Jan 17th 2012 glideinWMS architecture 9
10. Many operators
● Factory and Frontend are usually operated
by different people
● Frontends VO specific
● Operated by VO admins
● Each sets policies for its users
● Factories generic
● Do not need to be affiliated with any group
● Factory ops main task is Grid monitoring and
troubleshooting
UCSD Jan 17th 2012 glideinWMS architecture 10
11. glideinWMS
A (sort of) detailed view of
glidein_startup
UCSD Jan 17th 2012 glideinWMS architecture 11
13. glidein_startup tasks
● Validate node (environment)
● Download Condor binaries Performed
by plugins
● Configure Condor
● Start Condor daemon(s)
● Collect post-mortem monitoring info
● Cleanup
UCSD Jan 17th 2012 glideinWMS architecture 13
14. glidein_startup plugins
● Config files and scripts loaded via HTTP
● From both the factory and the frontend Web servers
● Can use local Web proxy (e.g. Squid)
● Mechanism tamper proof and cache coherent
Factory node glidein_startup
HTTPd ● Load files
from factory Web
Squid
● Load files
from frontend Web
Frontend node ● Run executables
● Start Condor Startd
HTTPd ● Cleanup
UCSD Jan 17th 2012 glideinWMS architecture 14
15. glidein_startup scripts
● Standard plugins
● Basic Grid node validation (certs, disk space, etc.)
● Setup Condor (glexec, CCB, etc.)
● VO provided plugins
● Optional, but can be anything
● CMS@UCSD checks for CMS SW
● Factory admin can also provide them
● Details about the plugins can be found at
http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html
UCSD Jan 17th 2012 glideinWMS architecture 15
16. glideinWMS
A (sort of) detailed view of the
glidein factory
UCSD Jan 17th 2012 glideinWMS architecture 16
17. Refresher – glideinWMS arch.
● The factory knowns about the grid and
submits glideins
Configure
Submit node Condor G.N.
Frontend node
Monitor Submit node
Condor Worker node
Frontend
Central manager glidein_startup
Match
CREAM Startd
Request
glideins Factory node
Condor glidein
Execution node
Globus
Factory glidein
Execution node
Submit
glideins
UCSD Jan 17th 2012 glideinWMS architecture 17
18. Glidein factory
● Glidein factory knows how to contact sites
● List in a local config
● Only trusted and tested sites should be included
● For each site (called entry)
● Contact info (Node, grid type, jobmanager)
● Site config (startup dir, glexec, OS type, …)
● VOs supported
● Other attributes (Site name, closest SE, ...)
● Admin maintained, periodically compared to BDII
http://tinyurl.com/glideinWMS/doc.prd/factory/configuration.html
UCSD Jan 17th 2012 glideinWMS architecture 18
19. Glidein factory role
● The glidein factory is just a slave
● The frontend(s) tell it how many glideins
to submit where
● Once the glideins start to run, they report to
the VO collector and the factory is not involved
● The communication is based on ClassAds
● The factory has a Collector for this purpose
Frontend node Factory node
Frontend Collector
Factory
UCSD Jan 17th 2012 glideinWMS architecture 19
21. Frontends
● The factory admin decides
which Frontends to serve
Frontend node
● Valid proxy
Frontend
with known DN needed
to talk to the collector
● Factory config has further
Factory node
fine grained controls
Collector
Frontend node
Factory
Frontend
UCSD Jan 17th 2012 glideinWMS architecture 21
22. Glidein submission
● The glidein factory (entry) uses
Condor-G to submit glideins
● Condor-G does the heavy lifting
● The factory just monitors the progress
glidein
glidein
Factory node
CREAM
Submit
Entry Schedd
. Monitor .
. .
. . glidein
Submit
Schedd Globus
Entry glidein
Monitor
UCSD Jan 17th 2012 glideinWMS architecture 22
23. Credentials/Proxy
● Proxy typically provided by the frontend
● Although the factory can provide a default one (rarely used)
● Proxy delivered encrypted in the ClassAd
● Factory (entry) provides the encryption key (PKI)
● Proxy stored on disk
● Each VO mapped to a different UID
Frontend node Factory node
Get key
Frontend Collector Schedd
Deliver proxy
(encrypted) Entry
UCSD Jan 17th 2012 glideinWMS architecture 23
24. glideinWMS
A (sort of) detailed view of the
VO frontend
UCSD Jan 17th 2012 glideinWMS architecture 24
25. Refresher – glideinWMS arch.
● The frontend monitors the user Condor pool,
does the matchmaking and requests glideins
Frontend domain Configure
Submit node Condor G.N.
Frontend node
Monitor Submit node
Condor Worker node
Frontend
Central manager glidein_startup
Match
CREAM Startd
Request
glideins Factory node
Condor glidein
Execution node
Globus
Factory glidein
Execution node
Submit
glideins
UCSD Jan 17th 2012 glideinWMS architecture 25
26. VO frontend
● The VO frontend is the brain
of a glideinWMS-based pool
● Like a site-level “negotiator”
VO domain Find Find
Submit node idle jobs entries
Frontend node
Monitor Submit node
Frontend Condor
Match
Central manager
Match
Request
Request glideins
glideins Factory node
UCSD Jan 17th 2012 glideinWMS architecture 26
27. Two level matchmaking
● The frontend triggers glidein submission
● The “regular” negotiator matches jobs to glideins
Central manager
glidein
Execution node
Collector CREAM
glidein
Execution node
Negotiator
Submit node
Schedd glidein
Execution node
glidein
Execution node
Startd
Globus
Job
Frontend
Factory
UCSD Jan 17th 2012 glideinWMS architecture 27
28. Frontend logic
● The glideinWMS glidein request logic
is based on the principle on “constant pressure”
● Frontend requests a certain number of
“idle glideins” in the factory queue at all times
● It does not request a specific number of glideins
● This is done due to the asynchronous nature of
the system
● Both the factory and the frontend are
in a polling loop and talk to each other indirectly
UCSD Jan 17th 2012 glideinWMS architecture 28
29. Frontend logic
● Frontend matches job attrs against entry attrs
● It then counts the matched idle jobs
● A fraction of this number becomes the
“pressure requests” (up to 1/3)
● The matchmaking expression is
defined by the frontend admin
● Not the user
● Debatable if it is better or worse, but it does reduce
frontend code complexity
UCSD Jan 17th 2012 glideinWMS architecture 29
30. Frontend config
● The frontend owns the “glidein proxy”
● And delegates it to the factory(s)
when requesting glideins
● Must keep it valid at all times
(usually at OS level)
● The VO frontend can (and should) provide
VO‑specific validation scripts
● The VO frontend can (and should) set the
glidein start expression
● Used by the VO negotiator for final matchmaking
UCSD Jan 17th 2012 glideinWMS architecture 30
31. glideinWMS
And the
summary
UCSD Jan 17th 2012 glideinWMS architecture 31
32. Summary
● Glideins are just properly configured Condor
execute nodes submitted as Grid jobs
● The glideinWMS is a mechanism to automate
glidein submission
● The glideinWMS is composed of three logical
entities, two being actual services:
● Glidein factories know about the Grid
● VO frontend know about the users and
drive the factories
UCSD Jan 17th 2012 glideinWMS architecture 32
33. Pointers
● glideinWMS development team is reachable at
glideinwms-support@fnal.gov
● The official project Web page is
http://tinyurl.com/glideinWMS
● CMS frontend at UCSD
http://glidein-collector.t2.ucsd.edu:8319/vofrontend/monitor/frontend_UCSD-v5_2/frontendStatus.html
● OSG glidein factory at UCSD
http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory
http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html
UCSD Jan 17th 2012 glideinWMS architecture 33
34. Acknowledgments
● The glideinWMS is a CMS-led project
developed mostly at FNAL, with contributions
from UCSD and ISI
● The glideinWMS factory operations at UCSD is
sponsored by OSG
● The funding comes from NSF, DOE and the
UC system
UCSD Jan 17th 2012 glideinWMS architecture 34