SlideShare a Scribd company logo
Aug 2014 How are glideins different 1
glideinWMS Training
How is glideinWMS different
from vanilla HTCondor
by Igor Sfiligoi, UC San Diego
Aug 2014 How are glideins different 2
Overview
● These slides provide an overview of why
glideinWMS installations behave differently
than dedicated, LAN-based HTCondor ones
Aug 2014 How are glideins different 3
Very heterogeneous res. pool
● Many user jobs have data constraints
– And data access varies from site to site
● Each site basically results in a different
“type of resource”
– Making the resources very heterogeneous,
for matchmaking purposes
– O(100) types of resources not unusual
● Leads to autocluster
number explosion
In dedicated HTCondor
pools, 5 classes of resources
is typically already a lot
Aug 2014 How are glideins different 4
Provisioning vs matchmaking
● Glideins are provisioned
(i.e. requested from sites)
because some user jobs need more resources
– But once provisioned they may not match any jobs
● Two main reasons
– Trigger jobs already gone (i.e. not idle anymore)
– Mismatch between provisioning and
matchmaking requirements Dedicated HTCondor
installations don't
have 2 levels of
matchmaking
Aug 2014 How are glideins different 5
Limited lease lifetime
● Glideins are basically leased execute nodes
– And they come with a limited lifetime
● Lease times usually in the order of one day
– Each glidein typically runs less than 10 user jobs
● User jobs must fit in the remaining lifetime
– Or they will be killed
● Makes for more complex matchmaking decisions
– And requires user help
Aug 2014 How are glideins different 6
Multicore and limited lifetimes
● Limited lifetimes particularly problematic for
multi-core jobs, resulting in significant waste
– Since it is unlikely all jobs will terminate
at exactly the same time
job3
job2
job1
job5
job6
job8
job4 job7 job9
WASTE
1
2
3
4
CPU
time
No suitable user jobs anymore
Pilot job
can terminate
Aug 2014 How are glideins different 7
Automatic shut down
● Glideins are configured to shut down
automatically if not used for some time
– Those resources could be used by someone else
– HTCondor not the only user of the resources
● Default Unclaimed threshold quite low
– About 10 minutes
● This puts stringent limits on Matchmaking
– If a Startd is not matched in time, it is “lost”
– And restarting glideins is expensive
Unlike a dedicated HTCondor pool
Aug 2014 How are glideins different 8
Strong end-to-end security
● A glideinWMS system will typically span many
different locations
● x509 authentication between all nodes required
– At daemon startup, then sec. session cached
– With the exception of Schedd<->Startd, where
security mediated through the Collector
● All over-the-wire communication
Integrity checked
– Requires auth. Neither typically used
in LAN deployments
Aug 2014 How are glideins different 9
Not privileged on execute side
● HTCondor daemons on the execute side
do not have system privileges
– Limits what HTCondor can do
● UID switching can be achieved with glexec
– But requires proxy delegation from schedd
– Only possible if users collaborate
– Relatively expensive (at least one per job startup)
● Many other functions not an option
– e.g. cgroups
Aug 2014 How are glideins different 10
Firewalls
● HTCondor basically a P2P system
– But execute nodes are often behind firewalls
● Requires the use of CCB and
shared_port_daemon to get around it
– But this adds complexity to the system
– Schedd particularly sensible here
● CCB can become single point of failure
– Either because temp. overloaded
– Or if it dies and HA not used
Aug 2014 How are glideins different 11
Very dynamic resource pool
● Startds tend to come and go often
– A side effect of limited lease lifetime
– And provisioning due to new jobs being submitted
● Many HTCondor optimizations less effective
– e.g. Security session caching
Aug 2014 How are glideins different 12
Increased resource pool size
● Most glideinWMS installations bigger than
most LAN HTCondor installations
– At least at the peaks
● Increased scale puts more load on non-execute
daemons
– Even before all the other considerations are applied
Aug 2014 How are glideins different 13
The end

More Related Content

Viewers also liked

Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondor
Igor Sfiligoi
 
Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014
Igor Sfiligoi
 
Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1Simon Kim
 
Augmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaAugmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with Nirvana
Igor Sfiligoi
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break points
Igor Sfiligoi
 

Viewers also liked (6)

distcom-short-20140112-1600
distcom-short-20140112-1600distcom-short-20140112-1600
distcom-short-20140112-1600
 
Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondor
 
Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014
 
Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1
 
Augmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaAugmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with Nirvana
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break points
 

Similar to How is glideinWMS different from vanilla HTCondor

Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
Igor Sfiligoi
 
Android Security: Defending Your Users
Android Security: Defending Your UsersAndroid Security: Defending Your Users
Android Security: Defending Your Users
CommonsWare
 
Digibury: SciVisum - Making your website fast - and scalable
Digibury: SciVisum - Making your website fast - and scalableDigibury: SciVisum - Making your website fast - and scalable
Digibury: SciVisum - Making your website fast - and scalable
Lizzie Hodgson
 
Java concurrecny
Java concurrecnyJava concurrecny
Java concurrecnynadeembtech
 
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas BhagatOSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
NETWAYS
 
Disposable infrastructure
Disposable infrastructureDisposable infrastructure
Disposable infrastructure
Mike Fowler
 
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...
C2B2 Consulting
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
Paris Open Source Summit
 
Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013
Azul Systems Inc.
 
Block Storage Updates - Juno Edition
Block Storage Updates - Juno EditionBlock Storage Updates - Juno Edition
Block Storage Updates - Juno Edition
OpenStack Foundation
 
High-performance high-availability Plone
High-performance high-availability PloneHigh-performance high-availability Plone
High-performance high-availability Plone
Guido Stevens
 
Evolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand RaoEvolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand Rao
VMware Tanzu
 
Cloud hosting your ePortfolio
Cloud hosting your ePortfolioCloud hosting your ePortfolio
Cloud hosting your ePortfolio
Mahara Hui
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
NETWAYS
 
"The cloud" - No longer a joke - Simon Story
"The cloud" - No longer a joke - Simon Story"The cloud" - No longer a joke - Simon Story
"The cloud" - No longer a joke - Simon Story
Ireland & UK Moodlemoot 2012
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Igor Sfiligoi
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
 
DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013
Ethan Ram
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
Strangeloop
 
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power ToolsSaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltStack
 

Similar to How is glideinWMS different from vanilla HTCondor (20)

Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
 
Android Security: Defending Your Users
Android Security: Defending Your UsersAndroid Security: Defending Your Users
Android Security: Defending Your Users
 
Digibury: SciVisum - Making your website fast - and scalable
Digibury: SciVisum - Making your website fast - and scalableDigibury: SciVisum - Making your website fast - and scalable
Digibury: SciVisum - Making your website fast - and scalable
 
Java concurrecny
Java concurrecnyJava concurrecny
Java concurrecny
 
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas BhagatOSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
 
Disposable infrastructure
Disposable infrastructureDisposable infrastructure
Disposable infrastructure
 
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...
Oracle SOA Suite Performance Tuning- UKOUG Application Server & Middleware SI...
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
 
Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013Java at Scale, Dallas JUG, October 2013
Java at Scale, Dallas JUG, October 2013
 
Block Storage Updates - Juno Edition
Block Storage Updates - Juno EditionBlock Storage Updates - Juno Edition
Block Storage Updates - Juno Edition
 
High-performance high-availability Plone
High-performance high-availability PloneHigh-performance high-availability Plone
High-performance high-availability Plone
 
Evolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand RaoEvolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand Rao
 
Cloud hosting your ePortfolio
Cloud hosting your ePortfolioCloud hosting your ePortfolio
Cloud hosting your ePortfolio
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
"The cloud" - No longer a joke - Simon Story
"The cloud" - No longer a joke - Simon Story"The cloud" - No longer a joke - Simon Story
"The cloud" - No longer a joke - Simon Story
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
 
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power ToolsSaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
SaltConf14 - Thomas Jackson, LinkedIn - Safety with Power Tools
 

More from Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
Igor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
Igor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
Igor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Igor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
Igor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Igor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
Igor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
Igor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
Igor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
Igor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
Igor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
Igor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
Igor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
Igor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
Igor Sfiligoi
 

More from Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

How is glideinWMS different from vanilla HTCondor

  • 1. Aug 2014 How are glideins different 1 glideinWMS Training How is glideinWMS different from vanilla HTCondor by Igor Sfiligoi, UC San Diego
  • 2. Aug 2014 How are glideins different 2 Overview ● These slides provide an overview of why glideinWMS installations behave differently than dedicated, LAN-based HTCondor ones
  • 3. Aug 2014 How are glideins different 3 Very heterogeneous res. pool ● Many user jobs have data constraints – And data access varies from site to site ● Each site basically results in a different “type of resource” – Making the resources very heterogeneous, for matchmaking purposes – O(100) types of resources not unusual ● Leads to autocluster number explosion In dedicated HTCondor pools, 5 classes of resources is typically already a lot
  • 4. Aug 2014 How are glideins different 4 Provisioning vs matchmaking ● Glideins are provisioned (i.e. requested from sites) because some user jobs need more resources – But once provisioned they may not match any jobs ● Two main reasons – Trigger jobs already gone (i.e. not idle anymore) – Mismatch between provisioning and matchmaking requirements Dedicated HTCondor installations don't have 2 levels of matchmaking
  • 5. Aug 2014 How are glideins different 5 Limited lease lifetime ● Glideins are basically leased execute nodes – And they come with a limited lifetime ● Lease times usually in the order of one day – Each glidein typically runs less than 10 user jobs ● User jobs must fit in the remaining lifetime – Or they will be killed ● Makes for more complex matchmaking decisions – And requires user help
  • 6. Aug 2014 How are glideins different 6 Multicore and limited lifetimes ● Limited lifetimes particularly problematic for multi-core jobs, resulting in significant waste – Since it is unlikely all jobs will terminate at exactly the same time job3 job2 job1 job5 job6 job8 job4 job7 job9 WASTE 1 2 3 4 CPU time No suitable user jobs anymore Pilot job can terminate
  • 7. Aug 2014 How are glideins different 7 Automatic shut down ● Glideins are configured to shut down automatically if not used for some time – Those resources could be used by someone else – HTCondor not the only user of the resources ● Default Unclaimed threshold quite low – About 10 minutes ● This puts stringent limits on Matchmaking – If a Startd is not matched in time, it is “lost” – And restarting glideins is expensive Unlike a dedicated HTCondor pool
  • 8. Aug 2014 How are glideins different 8 Strong end-to-end security ● A glideinWMS system will typically span many different locations ● x509 authentication between all nodes required – At daemon startup, then sec. session cached – With the exception of Schedd<->Startd, where security mediated through the Collector ● All over-the-wire communication Integrity checked – Requires auth. Neither typically used in LAN deployments
  • 9. Aug 2014 How are glideins different 9 Not privileged on execute side ● HTCondor daemons on the execute side do not have system privileges – Limits what HTCondor can do ● UID switching can be achieved with glexec – But requires proxy delegation from schedd – Only possible if users collaborate – Relatively expensive (at least one per job startup) ● Many other functions not an option – e.g. cgroups
  • 10. Aug 2014 How are glideins different 10 Firewalls ● HTCondor basically a P2P system – But execute nodes are often behind firewalls ● Requires the use of CCB and shared_port_daemon to get around it – But this adds complexity to the system – Schedd particularly sensible here ● CCB can become single point of failure – Either because temp. overloaded – Or if it dies and HA not used
  • 11. Aug 2014 How are glideins different 11 Very dynamic resource pool ● Startds tend to come and go often – A side effect of limited lease lifetime – And provisioning due to new jobs being submitted ● Many HTCondor optimizations less effective – e.g. Security session caching
  • 12. Aug 2014 How are glideins different 12 Increased resource pool size ● Most glideinWMS installations bigger than most LAN HTCondor installations – At least at the peaks ● Increased scale puts more load on non-execute daemons – Even before all the other considerations are applied
  • 13. Aug 2014 How are glideins different 13 The end