SlideShare a Scribd company logo
1 of 17
glideinWMS training



                         glideinWMS
                                               -

                      The Larger Picture
                  i.e. Is it something you would be interested in?

                              by Igor Sfiligoi (UCSD)




glideinWMS training             glideinWMS - The Larger Picture      1
Why this talk?


         If you never heard of glideinWMS before,
              you likely have no idea if this is
        a product you would be interested in using.

                              This talk presents
                      glideinWMS in a larger context,
                         allowing you to understand
                       what this product is all about.


glideinWMS training            glideinWMS - The Larger Picture   2
The basics
 ●   glideinWMS has been designed to address the
     needs of High Throughput Computing (HTC)
      ●
          Better known as batch processing
 ●   In a nutshell, we are trying to facilitate
     the effective use
     of a large number of CPUs
     by a large number of users




glideinWMS training    glideinWMS - The Larger Picture   3
High Throughput Computing
 ●   The basic premise of HTC is that there is
     always more demand than available CPUs
 ●   We should make good use of those CPUs
      ●   Keep them busy, ideally, 24x7x365
 ●   Sustained utilization is thus
     more important than peak performance
      ●   Measure of success is
          FLOPY = Floating Points per Year
          not
          FLOPS = Floating Points per Second
glideinWMS training     glideinWMS - The Larger Picture   4
HTC from the user point of view
 ●   As a side effect, users must be HTC-aware
 ●   There are some negative aspects
      ●   No interactive access, only process queuing
            –   Usually referred to as user jobs
      ●   Waiting in line to get access to CPUs
 ●   But the payoff is potentially huge
      ●   A single user can use 1000s CPUs at a time
      ●   Performing in few days
          computations that would
          take several years on a single machine
glideinWMS training           glideinWMS - The Larger Picture   5
HTC in simplified picture

                                   User scheduling
                                   usually not FIFO



                      Repository

                                        Scheduler




glideinWMS training           glideinWMS - The Larger Picture   6
HTC products
 ●   There are many HTC products available
      ●   Although most call themselves “batch systems”
 ●   A non exhaustive list:
      ●   Condor
      ●   PBS, with variants like Torque/Maui
      ●   LSF
      ●   SGE, also known as Oracle Grid Engine




glideinWMS training      glideinWMS - The Larger Picture   7
Why another system?
 ●   All of the mentioned HTC systems
     assume full control
     of the compute resources (i.e. CPUs)
      ●   And there are many places where this is the case
 ●   glideinWMS developed to
     support non-dedicated use
     of compute resources
      ●   i.e. when CPUs are given
          to the system only
          for limited duration at a time

glideinWMS training        glideinWMS - The Larger Picture   8
Non-dedicated resources
 ●   In the past decade, two paradigms emerged
      ●   Grid computing
      ●   Cloud computing
 ●   Both allow a user community to use compute
     resources they don't own
      ●   Often called resource elasticity
 ●   Managing large number of Grid and Cloud
     resources by hand impractical
      ●   glideinWMS creates a HTC system using them

glideinWMS training        glideinWMS - The Larger Picture   9
Grid vs Cloud
                              (a short summary)

●    Grid computing is                     ●   (Commercial) Clouds are
     basically a federation                    about leasing resources
     of HTC clusters                           on a pay-as-you-go basis
       ●   Thus recently called                 ●    And they happen to use
           Distributed HTC                           virtualization
●    Job queuing is a                      ●   Instances expected to
     native paradigm                           start almost immediately
               ●   So-called “scientific clouds” are typically
                   just Grid systems that use virtualization
                   (and a different middleware stack)
    glideinWMS training        glideinWMS - The Larger Picture                10
Grid vs Cloud
                              (a short summary)

●    Grid computing is                     ●   (Commercial) Clouds are
     basically a federation                    about leasing resources
     of HTC clusters                           on a pay-as-you-go basis
                                                 glideinWMS
                                           currently optimized
       ●   Thus recently called               ● And they happen to use
                                            for the Grid model
           Distributed HTC                       virtualization
●    Job queuing is a                      ●   Instances expected to
     native paradigm                           start almost immediately
               ●   So-called “scientific clouds” are typically
                   just Grid systems that use virtualization
                   (and a different middleware stack)
    glideinWMS training        glideinWMS - The Larger Picture           11
glideinWMS and the Grid
                      (Cloud resources are used in a similar way)


 ●   glideinWMS creates
     an overlay system on top of
     the various HTC clusters
                                                                            HTC
      ●   From the user community                               HTC
          point of view,                                              glideinWMS
          a single HTC system                                             HTC      HTC

                                                                HTC
      ●   Just a dynamic one                                               HTC
 ●   glideinWMS
     completely automates
     the process

glideinWMS training           glideinWMS - The Larger Picture                            12
Implementation and support
 ●   glideinWMS heavily based on Condor
      ●   Essentially a thin layer on top of it
 ●   Most of the software support thus coming
     from the Condor development team
      ●   At University of Wisconsin – Madison
          http://research.cs.wisc.edu/condor/
 ●   The glideinWMS-specific layer supported by a
     team spanning Fermilab, UCSD and ISI
     http://tinyurl.com/glideinWMS



glideinWMS training           glideinWMS - The Larger Picture   13
glideinWMS and Condor
 ●   Condor handles the HTC system
      ●   Most Condor features thus available
 ●   glideinWMS role limited to scheduling,
     configuring and starting the Condor process
     on the compute resources
                                                                                   HTC
                           glideinWMS
                                                                       Condor
                                                                     CPU Handler

                                                                            User Job
                       Condor
                         Job
                      Repository


glideinWMS training                glideinWMS - The Larger Picture                       14
Summary
 ●   glideinWMS is a HTC product
      ●   i.e. enables effective use of a large number
          of CPUs by a large number of users
 ●   glideinWMS creates a HTC system out of
     non‑dedicated compute resources
      ●   e.g. Grid and Cloud resources
 ●   glideinWMS is heavily based on Condor
      ●   thus benefits from the Condor team support


glideinWMS training      glideinWMS - The Larger Picture   15
Pointers
 ●   glideinWMS development team is reachable at
     glideinwms-support@fnal.gov
 ●   The official project Web page is
     http://tinyurl.com/glideinWMS
 ●   OSG glidein factory at UCSD
     http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory
     http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html




glideinWMS training                      glideinWMS - The Larger Picture                                 16
Acknowledgments
 ●   This document was sponsored by grants from
     the US NSF and US DOE,
     and by the UC system




glideinWMS training      glideinWMS - The Larger Picture   17

More Related Content

Similar to glideinWMS - The Larger Picture

Web scale with-nutanix_rev
Web scale with-nutanix_revWeb scale with-nutanix_rev
Web scale with-nutanix_revScalar Decisions
 
Up is Down, Black is White: Using SCCM for Wrong and Right
Up is Down, Black is White: Using SCCM for Wrong and RightUp is Down, Black is White: Using SCCM for Wrong and Right
Up is Down, Black is White: Using SCCM for Wrong and Rightenigma0x3
 
VMworld 2015: The “Snappy” Virtual Desktop User Experience
VMworld 2015: The “Snappy” Virtual Desktop User ExperienceVMworld 2015: The “Snappy” Virtual Desktop User Experience
VMworld 2015: The “Snappy” Virtual Desktop User ExperienceVMworld
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
 
Kernel Recipes 2017 - The state of kernel self-protection - Kees Cook
Kernel Recipes 2017 - The state of kernel self-protection - Kees CookKernel Recipes 2017 - The state of kernel self-protection - Kees Cook
Kernel Recipes 2017 - The state of kernel self-protection - Kees CookAnne Nicolas
 
Webinar: Enterprise Cloud Migration - 4 Problems to Solve
Webinar: Enterprise Cloud Migration - 4 Problems to SolveWebinar: Enterprise Cloud Migration - 4 Problems to Solve
Webinar: Enterprise Cloud Migration - 4 Problems to SolveStorage Switzerland
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?ScyllaDB
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed_Hat_Storage
 
Cloud 101: The Basics of Cloud Computing
Cloud 101: The Basics of Cloud ComputingCloud 101: The Basics of Cloud Computing
Cloud 101: The Basics of Cloud ComputingHostway|HOSTING
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014Igor Sfiligoi
 
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)Ron Munitz
 
Top 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To FocusTop 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To Focusdevopsjourney
 
Why Cloud Computing has to go the FOSS way
Why Cloud Computing has to go the FOSS wayWhy Cloud Computing has to go the FOSS way
Why Cloud Computing has to go the FOSS wayAhmed Mekkawy
 
Cloud Computing & Windows Azure
Cloud Computing & Windows AzureCloud Computing & Windows Azure
Cloud Computing & Windows Azureyeschandana
 
Virtualization and its importance and implementation levels
Virtualization and its importance and implementation levelsVirtualization and its importance and implementation levels
Virtualization and its importance and implementation levelsMianMubeen3
 
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)Ron Munitz
 
Cloud computing basics
Cloud computing basicsCloud computing basics
Cloud computing basicsAkshay Guleria
 
A guide to modern software development 2018
A guide to modern software development 2018A guide to modern software development 2018
A guide to modern software development 2018Peter Bittner
 
Webinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually HappenWebinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually HappenStorage Switzerland
 

Similar to glideinWMS - The Larger Picture (20)

Web scale with-nutanix_rev
Web scale with-nutanix_revWeb scale with-nutanix_rev
Web scale with-nutanix_rev
 
Up is Down, Black is White: Using SCCM for Wrong and Right
Up is Down, Black is White: Using SCCM for Wrong and RightUp is Down, Black is White: Using SCCM for Wrong and Right
Up is Down, Black is White: Using SCCM for Wrong and Right
 
VMworld 2015: The “Snappy” Virtual Desktop User Experience
VMworld 2015: The “Snappy” Virtual Desktop User ExperienceVMworld 2015: The “Snappy” Virtual Desktop User Experience
VMworld 2015: The “Snappy” Virtual Desktop User Experience
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Kernel Recipes 2017 - The state of kernel self-protection - Kees Cook
Kernel Recipes 2017 - The state of kernel self-protection - Kees CookKernel Recipes 2017 - The state of kernel self-protection - Kees Cook
Kernel Recipes 2017 - The state of kernel self-protection - Kees Cook
 
Webinar: Enterprise Cloud Migration - 4 Problems to Solve
Webinar: Enterprise Cloud Migration - 4 Problems to SolveWebinar: Enterprise Cloud Migration - 4 Problems to Solve
Webinar: Enterprise Cloud Migration - 4 Problems to Solve
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
 
Cloud 101: The Basics of Cloud Computing
Cloud 101: The Basics of Cloud ComputingCloud 101: The Basics of Cloud Computing
Cloud 101: The Basics of Cloud Computing
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014
 
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
 
Top 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To FocusTop 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To Focus
 
Why Cloud Computing has to go the FOSS way
Why Cloud Computing has to go the FOSS wayWhy Cloud Computing has to go the FOSS way
Why Cloud Computing has to go the FOSS way
 
Cloud Computing & Windows Azure
Cloud Computing & Windows AzureCloud Computing & Windows Azure
Cloud Computing & Windows Azure
 
Virtualization and its importance and implementation levels
Virtualization and its importance and implementation levelsVirtualization and its importance and implementation levels
Virtualization and its importance and implementation levels
 
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
 
Cloud computing basics
Cloud computing basicsCloud computing basics
Cloud computing basics
 
A Seminar on Cloud Computing
A Seminar on Cloud ComputingA Seminar on Cloud Computing
A Seminar on Cloud Computing
 
A guide to modern software development 2018
A guide to modern software development 2018A guide to modern software development 2018
A guide to modern software development 2018
 
Webinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually HappenWebinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually Happen
 

More from Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

More from Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

glideinWMS - The Larger Picture

  • 1. glideinWMS training glideinWMS - The Larger Picture i.e. Is it something you would be interested in? by Igor Sfiligoi (UCSD) glideinWMS training glideinWMS - The Larger Picture 1
  • 2. Why this talk? If you never heard of glideinWMS before, you likely have no idea if this is a product you would be interested in using. This talk presents glideinWMS in a larger context, allowing you to understand what this product is all about. glideinWMS training glideinWMS - The Larger Picture 2
  • 3. The basics ● glideinWMS has been designed to address the needs of High Throughput Computing (HTC) ● Better known as batch processing ● In a nutshell, we are trying to facilitate the effective use of a large number of CPUs by a large number of users glideinWMS training glideinWMS - The Larger Picture 3
  • 4. High Throughput Computing ● The basic premise of HTC is that there is always more demand than available CPUs ● We should make good use of those CPUs ● Keep them busy, ideally, 24x7x365 ● Sustained utilization is thus more important than peak performance ● Measure of success is FLOPY = Floating Points per Year not FLOPS = Floating Points per Second glideinWMS training glideinWMS - The Larger Picture 4
  • 5. HTC from the user point of view ● As a side effect, users must be HTC-aware ● There are some negative aspects ● No interactive access, only process queuing – Usually referred to as user jobs ● Waiting in line to get access to CPUs ● But the payoff is potentially huge ● A single user can use 1000s CPUs at a time ● Performing in few days computations that would take several years on a single machine glideinWMS training glideinWMS - The Larger Picture 5
  • 6. HTC in simplified picture User scheduling usually not FIFO Repository Scheduler glideinWMS training glideinWMS - The Larger Picture 6
  • 7. HTC products ● There are many HTC products available ● Although most call themselves “batch systems” ● A non exhaustive list: ● Condor ● PBS, with variants like Torque/Maui ● LSF ● SGE, also known as Oracle Grid Engine glideinWMS training glideinWMS - The Larger Picture 7
  • 8. Why another system? ● All of the mentioned HTC systems assume full control of the compute resources (i.e. CPUs) ● And there are many places where this is the case ● glideinWMS developed to support non-dedicated use of compute resources ● i.e. when CPUs are given to the system only for limited duration at a time glideinWMS training glideinWMS - The Larger Picture 8
  • 9. Non-dedicated resources ● In the past decade, two paradigms emerged ● Grid computing ● Cloud computing ● Both allow a user community to use compute resources they don't own ● Often called resource elasticity ● Managing large number of Grid and Cloud resources by hand impractical ● glideinWMS creates a HTC system using them glideinWMS training glideinWMS - The Larger Picture 9
  • 10. Grid vs Cloud (a short summary) ● Grid computing is ● (Commercial) Clouds are basically a federation about leasing resources of HTC clusters on a pay-as-you-go basis ● Thus recently called ● And they happen to use Distributed HTC virtualization ● Job queuing is a ● Instances expected to native paradigm start almost immediately ● So-called “scientific clouds” are typically just Grid systems that use virtualization (and a different middleware stack) glideinWMS training glideinWMS - The Larger Picture 10
  • 11. Grid vs Cloud (a short summary) ● Grid computing is ● (Commercial) Clouds are basically a federation about leasing resources of HTC clusters on a pay-as-you-go basis glideinWMS currently optimized ● Thus recently called ● And they happen to use for the Grid model Distributed HTC virtualization ● Job queuing is a ● Instances expected to native paradigm start almost immediately ● So-called “scientific clouds” are typically just Grid systems that use virtualization (and a different middleware stack) glideinWMS training glideinWMS - The Larger Picture 11
  • 12. glideinWMS and the Grid (Cloud resources are used in a similar way) ● glideinWMS creates an overlay system on top of the various HTC clusters HTC ● From the user community HTC point of view, glideinWMS a single HTC system HTC HTC HTC ● Just a dynamic one HTC ● glideinWMS completely automates the process glideinWMS training glideinWMS - The Larger Picture 12
  • 13. Implementation and support ● glideinWMS heavily based on Condor ● Essentially a thin layer on top of it ● Most of the software support thus coming from the Condor development team ● At University of Wisconsin – Madison http://research.cs.wisc.edu/condor/ ● The glideinWMS-specific layer supported by a team spanning Fermilab, UCSD and ISI http://tinyurl.com/glideinWMS glideinWMS training glideinWMS - The Larger Picture 13
  • 14. glideinWMS and Condor ● Condor handles the HTC system ● Most Condor features thus available ● glideinWMS role limited to scheduling, configuring and starting the Condor process on the compute resources HTC glideinWMS Condor CPU Handler User Job Condor Job Repository glideinWMS training glideinWMS - The Larger Picture 14
  • 15. Summary ● glideinWMS is a HTC product ● i.e. enables effective use of a large number of CPUs by a large number of users ● glideinWMS creates a HTC system out of non‑dedicated compute resources ● e.g. Grid and Cloud resources ● glideinWMS is heavily based on Condor ● thus benefits from the Condor team support glideinWMS training glideinWMS - The Larger Picture 15
  • 16. Pointers ● glideinWMS development team is reachable at glideinwms-support@fnal.gov ● The official project Web page is http://tinyurl.com/glideinWMS ● OSG glidein factory at UCSD http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html glideinWMS training glideinWMS - The Larger Picture 16
  • 17. Acknowledgments ● This document was sponsored by grants from the US NSF and US DOE, and by the UC system glideinWMS training glideinWMS - The Larger Picture 17