SlideShare a Scribd company logo
1 of 1
Download to read offline
HEP, Cavendish Laboratory
Configuring & Enabling Condor
in LHC Computing Grid
Condor is a specialized workload management system for compute-intensive jobs, which can effectively manage a variety of
clusters of dedicated compute nodes. Today, there are grid schedulers, resource managers, and workload management
systems available that can provide the functionality of the traditional batch queuing system e.g. Torque/PBS or provide the
ability to harness cycles from idle desktop workstations. Condor addresses both of these areas by providing a single tool. In
Grid-style computing environment, Condor's "flocking" technology allows multiple Condor compute installations to work
together and opens a wide range of possible options for resource sharing.
Central Manager
negotiator
master
startd
collector
scheddSubmit Node
schedd
master
Execute Node
startd
master
Regular Node
master
schedd
startd
Execute Node
startd
master
Execute Node
startd
master
Process Spawned
ClassAd
Communication
Pathway
Condor Working Model Like other full-featured batch systems, Condor provides the traditional
job queuing mechanism, scheduling policy, priority scheme, along with
resource classifications. In a nutshell, job is submitted to Condor via a
machine running a scheduler (schedd). The scheduler communicates
with the collector process on the Central Manager (CM). The
negotiator on the CM performs a matchmaking service and sends jobs
to an available machine on the network which begins running the job
on that machine. Machines that can run jobs (Execute Node) also
communicate with the collector (via a startd process). A shadow
process on the Submit node keeps communicating with the running job
so if the job stops executing, Condor can detect this (e.g. if the job or
the machine crashes). If checkpointing is not in use, these jobs can be
restarted by Condor if requested and allowed.
Although, Condor as a batch system, is officially supported by gLite/EGEE, various parts of the middleware still limited to
the PBS/Torque in terms of transparent integrity. We have extended the support to allow middleware to work seamlessly
with Condor and enable interaction with local/university compute clusters. We provide details of the configuration,
implementation, and testing of Condor for LCG in multi-cultural environment, where a common cluster is used for different
types of jobs. The system is presented as an extension to the default LCG/gLite configuration that provides transparent
access for both LCG and local jobs to the common resource. Using Condor and Chirp/Parrot, we have extended the
possibilities to use university clusters for LCG/gLite jobs in a very non-privileged way.
PoolsmaintainedbyindividualGroup/Department
Local Users
HEP Cluster
Execute node
Execute node
Execute node
CamGrid Project Model
HEPCM+
gLiteSubmitnode
Local(HEP)
Submitnode
Central
Submitnode
Grid Users
Grid Users
Central
Submitnode
HEP submission
CamGrid submission
LCG/gLite submission
Condor works by providing a High Throughput
Computing (HTC) environment. In addition to
the typical usage scenario, Condor can also
effectively manage no dedicated resources
by taking advantage of spare cycles when
those resources are idle. The ClassAd
mechanism in Condor provides an extremely
flexible and expressive framework for
matching resource requests (jobs) with
resource offers (machines). This is why we
have chosen Condor as the primary batch
system for our LCG farm. The same cluster is
also the part of CamGrid Project. The condor
central manager is configured to act as a
submit node only for the LCG/gLite
submission. The additional submit node is for
our local users and for CamGrid submission.
CamGrid is made up of a number of Condor
pools belonging to the departments and that
allow their resources to be shared by using
Condor's flocking mechanism. This federated
approach means that there is no single point
of failure in this environment, and the grid
does not depend on any individual pool to
continue working. Grid jobs, runs only on the
HEP cluster but the CmGrid jobs, submitted
through central submit hosts or the local
submit host, run everywhere across the
CamGrid infrastructure.
Santanu	
  Das.	
  3rd	
  EGEE	
  User	
  Forum,	
  11-­‐14	
  February	
  2008,	
  Clermont-­‐Ferrand,	
  FRANCE.	
  santanu@hep.phy.cam.ac.uk	
  

More Related Content

Similar to Enabling Condor in LHC Computing Grid

BARCoMmS Ground Station Testing System
BARCoMmS Ground Station Testing SystemBARCoMmS Ground Station Testing System
BARCoMmS Ground Station Testing System
Riley Waite
 
Srdf overview latency_v.52
Srdf overview latency_v.52Srdf overview latency_v.52
Srdf overview latency_v.52
jas3399
 

Similar to Enabling Condor in LHC Computing Grid (20)

In-Memory Compute Grids… Explained
In-Memory Compute Grids… ExplainedIn-Memory Compute Grids… Explained
In-Memory Compute Grids… Explained
 
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
 
D017212027
D017212027D017212027
D017212027
 
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
 
fog05: The Fog Computing Infrastructure
fog05: The Fog Computing Infrastructurefog05: The Fog Computing Infrastructure
fog05: The Fog Computing Infrastructure
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
Hybrid Cloud Monitoring - Datatdog
Hybrid Cloud Monitoring - DatatdogHybrid Cloud Monitoring - Datatdog
Hybrid Cloud Monitoring - Datatdog
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
BARCoMmS Ground Station Testing System
BARCoMmS Ground Station Testing SystemBARCoMmS Ground Station Testing System
BARCoMmS Ground Station Testing System
 
prodops.io k8s presentation
prodops.io k8s presentationprodops.io k8s presentation
prodops.io k8s presentation
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
 
1844 1849
1844 18491844 1849
1844 1849
 
1844 1849
1844 18491844 1849
1844 1849
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
 
Locationless data science on a modern secure edge
Locationless data science on a modern secure edgeLocationless data science on a modern secure edge
Locationless data science on a modern secure edge
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 
Srdf overview latency_v.5
Srdf overview latency_v.5Srdf overview latency_v.5
Srdf overview latency_v.5
 
Srdf overview latency_v.52
Srdf overview latency_v.52Srdf overview latency_v.52
Srdf overview latency_v.52
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Enabling Condor in LHC Computing Grid

  • 1. HEP, Cavendish Laboratory Configuring & Enabling Condor in LHC Computing Grid Condor is a specialized workload management system for compute-intensive jobs, which can effectively manage a variety of clusters of dedicated compute nodes. Today, there are grid schedulers, resource managers, and workload management systems available that can provide the functionality of the traditional batch queuing system e.g. Torque/PBS or provide the ability to harness cycles from idle desktop workstations. Condor addresses both of these areas by providing a single tool. In Grid-style computing environment, Condor's "flocking" technology allows multiple Condor compute installations to work together and opens a wide range of possible options for resource sharing. Central Manager negotiator master startd collector scheddSubmit Node schedd master Execute Node startd master Regular Node master schedd startd Execute Node startd master Execute Node startd master Process Spawned ClassAd Communication Pathway Condor Working Model Like other full-featured batch systems, Condor provides the traditional job queuing mechanism, scheduling policy, priority scheme, along with resource classifications. In a nutshell, job is submitted to Condor via a machine running a scheduler (schedd). The scheduler communicates with the collector process on the Central Manager (CM). The negotiator on the CM performs a matchmaking service and sends jobs to an available machine on the network which begins running the job on that machine. Machines that can run jobs (Execute Node) also communicate with the collector (via a startd process). A shadow process on the Submit node keeps communicating with the running job so if the job stops executing, Condor can detect this (e.g. if the job or the machine crashes). If checkpointing is not in use, these jobs can be restarted by Condor if requested and allowed. Although, Condor as a batch system, is officially supported by gLite/EGEE, various parts of the middleware still limited to the PBS/Torque in terms of transparent integrity. We have extended the support to allow middleware to work seamlessly with Condor and enable interaction with local/university compute clusters. We provide details of the configuration, implementation, and testing of Condor for LCG in multi-cultural environment, where a common cluster is used for different types of jobs. The system is presented as an extension to the default LCG/gLite configuration that provides transparent access for both LCG and local jobs to the common resource. Using Condor and Chirp/Parrot, we have extended the possibilities to use university clusters for LCG/gLite jobs in a very non-privileged way. PoolsmaintainedbyindividualGroup/Department Local Users HEP Cluster Execute node Execute node Execute node CamGrid Project Model HEPCM+ gLiteSubmitnode Local(HEP) Submitnode Central Submitnode Grid Users Grid Users Central Submitnode HEP submission CamGrid submission LCG/gLite submission Condor works by providing a High Throughput Computing (HTC) environment. In addition to the typical usage scenario, Condor can also effectively manage no dedicated resources by taking advantage of spare cycles when those resources are idle. The ClassAd mechanism in Condor provides an extremely flexible and expressive framework for matching resource requests (jobs) with resource offers (machines). This is why we have chosen Condor as the primary batch system for our LCG farm. The same cluster is also the part of CamGrid Project. The condor central manager is configured to act as a submit node only for the LCG/gLite submission. The additional submit node is for our local users and for CamGrid submission. CamGrid is made up of a number of Condor pools belonging to the departments and that allow their resources to be shared by using Condor's flocking mechanism. This federated approach means that there is no single point of failure in this environment, and the grid does not depend on any individual pool to continue working. Grid jobs, runs only on the HEP cluster but the CmGrid jobs, submitted through central submit hosts or the local submit host, run everywhere across the CamGrid infrastructure. Santanu  Das.  3rd  EGEE  User  Forum,  11-­‐14  February  2008,  Clermont-­‐Ferrand,  FRANCE.  santanu@hep.phy.cam.ac.uk