SlideShare a Scribd company logo
1 of 47
Download to read offline
Introduction to Distributed HTC
and overlay systems
Tuesday morning session
Igor Sfiligoi <isfiligoi@ucsd.edu>
University of California San Diego
2014 OSG User School DHTC and Overlays
Logistical reminder
• It is OK to ask questions
- During the lecture
- During the demos
- During the exercises
- During the breaks
• If I don't know the answer,
I will find someone who likely does
2014 OSG User School DHTC and Overlays
High Throughput Computing
• Yesterday you were introduced to HTC
- Often called batch system computing
- A paradigm that emphasizes
maximizing the amount of
useful computing
over
long periods
of time
Scheduler
Jobs
Batch slots
2014 OSG User School DHTC and Overlays
Local HTC
• What you have really
experienced so far
is local HTC
• i.e. computing on dedicated cluster of
dedicated resources
- Managed by a single admin group
- Co-located in a single location
2014 OSG User School DHTC and Overlays
Is there anything else?
• As you might expect,
there are several insulated
“local HTC” clusters
installed around the world
• And there are
non-HTC systems
out there, too
2014 OSG User School DHTC and Overlays
Is there anything else?
• As you might expect,
there are several insulated
“local HTC” clusters
installed around the world
• And there are
non-HTC systems
out there, too
And you point is …???
2014 OSG User School DHTC and Overlays
Just local HTC
• You moved from a single PC
- O(1) cores
• To a local HTC cluster
- Say, O(100) cores daily avg
Great!
I can have my results back
in 1/100th
of the time
2014 OSG User School DHTC and Overlays
Just local HTC
• You moved from a single PC
- O(1) cores
• To a local HTC cluster
- Say, O(100) cores daily avg
• But is it fast enough?
It will still take me
over a month to get
the results back!
The result of the
10 body simulation
is very promising!
I want to run a
100 body one!
2014 OSG User School DHTC and Overlays
How to get more?
• If you find out that you are
resource constrained
- What do you do?
2014 OSG User School DHTC and Overlays
How to get more?
• If you find out that you are
resource constrained
- What do you do?
• Beg for a larger share of the local pool
- i.e. better priority compared
to the other users of the same pool
There are likely
100s of users
But you better be
doing something
really important
2014 OSG User School DHTC and Overlays
How to get more?
• If you find out that you are
resource constrained
- What do you do?
• Beg for a larger share of the local pool
• Pay to get more resources bought
into the pool
- Great for long term needs
 If you can afford it
- But will not help you in the short term
Getting new machines
installed can take months!
2014 OSG User School DHTC and Overlays
How to get more?
• If you find out that you are
resource constrained
- What do you do?
• Beg for a larger share of the local pool
• Pay to get more resources bought
• Get the needed resources
somewhere else
- i.e. not locally
2014 OSG User School DHTC and Overlays
How to get more?
• If you find out that you are
resource constrained
- What do you do?
• Beg for a larger share of the local pool
• Pay to get more resources bought
• Get the needed resources
somewhere else
- i.e. not locally
This is what
this lecture
is all about!
2014 OSG User School DHTC and Overlays
Distributed HTC
• A computing paradigm that aims at
maximizing useful computation
using any available resource
located anywhere on the planet
• As a corollary
- Compute resources are
owned and operated by
several independent groups
This place is huge!
2014 OSG User School DHTC and Overlays
Implications of distributed computing
• We will be dealing with multiple
independent compute systems
- That do not know about each other
They are owned
by several
independent groups,
remember?
2014 OSG User School DHTC and Overlays
Implications of distributed computing
• We will be dealing with multiple
independent compute systems
- That do not know about each other
• No global HTC scheduler
out of the box
Scheduler
2014 OSG User School DHTC and Overlays
Implications of distributed computing
• We will be dealing with multiple
independent compute systems
- That do not know about each other
• No global HTC scheduler
out of the box
- Will have to stitch them
all together
2014 OSG User School DHTC and Overlays
The naïve way
• The simplest way is
to partition your jobs and
submit a subset to each cluster
Scheduler
Scheduler
Scheduler
2014 OSG User School DHTC and Overlays
The naïve way
• The simplest way is to partition your jobs
and submit a subset to each cluster
• The drawbacks
- You may need multiple user accounts
- You may need several different tools
- Doing a good partitioning is hard
 Want proportional to the resources you will get
- And what if those CPUs are not
in a HTC setup at all?
2014 OSG User School DHTC and Overlays
The naïve way
• The simplest way is to partition your jobs
and submit a subset to each cluster
• The drawbacks
- You may need multiple user accounts
- You may need several different tools
- Doing a good partitioning hard
 Want proportional to the resources you will get
- And what if those CPUs are not
in a HTC setup at all?
What's the
alternative???
2014 OSG User School DHTC and Overlays
Use an overlay system
• Use a systems that
looks and feels
like a regular HTC to users
- But has compute nodes all over the world
I like the
sound of this!
2014 OSG User School DHTC and Overlays
Use an overlay system
• Use a systems that
looks and feels
like a regular HTC to users
- But has compute nodes all over the world
I like the
sound of this!
But...
didn't you just say
there was no such thing???
2014 OSG User School DHTC and Overlays
Use an overlay system
• Use a systems that
looks and feels
like a regular HTC to users
- But has compute nodes all over the world
But...
didn't you just say
there was
no such thing???
I said
“out of the box”
But one can
create one.
2014 OSG User School DHTC and Overlays
Creating a dynamic overlay sys
• No single person cannot manage
all the existing resources
Scheduler
2014 OSG User School DHTC and Overlays
Creating a dynamic overlay sys
• But we can lease a subset of them
- We discuss the how later
2014 OSG User School DHTC and Overlays
Creating a dynamic overlay sys
• But we can lease a subset of them
• And instantiate a HTC system on them
Scheduler
2014 OSG User School DHTC and Overlays
Creating a dynamic overlay sys
• But we can lease a subset of them
• And instantiate a HTC system on them
- Now we can schedule user jobs on them
- Only “our” resources are considered
Scheduler
Jobs
2014 OSG User School DHTC and Overlays
DHTC through an overlay sys
• Just another HTC system
- Well, almost
- More details in
a few slides
Scheduler
Jobs
Cool!Cool!Cool!Cool!
2014 OSG User School DHTC and Overlays
Overlay system ownership
• Setting up an overlay a major task
- Comparable with installing a
dedicated cluster
• Long term maintenance is also costly
• Not something a final user
would want to do
2014 OSG User School DHTC and Overlays
Typical overlay sys operators
• Existing HTC admins, e.g.
- The UW HTC cluster can “overflow” into OSG
- The UCSD operates one for local users
• Scientific communities, e.g.
- The CMS LHC experiment
• The Open Science Grid itself
- With the OSG Connect
More on this
later today
2014 OSG User School DHTC and Overlays
Is DHTC really just HTC?
• Even with overlays, there are
some differences between
DHTC and HTC
• With or without overlays,
the core reasons are:
- Multiple independent HW operators
- Not all resources are co-located
2014 OSG User School DHTC and Overlays
The multiple owners problem
• In the “Grid” world, the resource owner
decides which Operating System, which
OS services and which libraries to install
- A way smaller problem in the “Cloud” world
- But most of the current DHTC landscape
is based on the Grid paradigm
• Different clusters likely configured differently
2014 OSG User School DHTC and Overlays
The multiple owners problem
• In the “Grid” world, the resource owner
decides which Operating System, which
OS services and which libraries to install
- A way smaller problem in the “Cloud” world
- But most of the current DHTC landscape
is based on the Grid paradigm
• Different clusters likely configured differently
• Even if you could get your pet library/service
installed on some clusters, you cannot
expect to get it installed everywhere
2014 OSG User School DHTC and Overlays
Heterogeneity
• DHTC systems are way more
heterogeneous than “local HTC” ones
• Two ways to approach this:
- Minimize external dependencies in
your compute jobs
 Make them self-contained
 Adapt to the running environment
(e.g. multiple binaries, one per platform)
 Do not use licensed software
2014 OSG User School DHTC and Overlays
Heterogeneity
• DHTC systems are way more
heterogeneous than “local HTC” ones
• Two ways to approach this:
- Minimize external dependencies in
your compute jobs
 Make them self-contained
 Adapt to the running environment
(e.g. multiple binaries, one per platform)
 Do not use licensed software
Sounds like
a lot of work!
2014 OSG User School DHTC and Overlays
Heterogeneity
• DHTC systems are way more
heterogeneous than “local HTC” ones
• Two ways to approach this:
- Minimize external dependencies
- Use only a subset of the resources
 Restrict where your jobs can run
 Your job will of course take longer to finish
 May still get you the result sooner
2014 OSG User School DHTC and Overlays
Heterogeneity
• DHTC systems are way more
heterogeneous than “local HTC” ones
• Two ways to approach this:
- Minimize external dependencies
- Use only a subset of the resources
 Restrict where your jobs can run
 Your job will of course take longer to finish
 May still get you the result sooner
So long for
DHTC!
2014 OSG User School DHTC and Overlays
Heterogeneity
• DHTC systems are way more
heterogeneous than “local HTC” ones
• Two ways to approach this:
- Minimize external dependencies
- Use only a subset of the resources
 Restrict where your jobs can run
 Your job will of course take longer to finish
 May still get you the result sooner
So long for
DHTC!
Unfortunately,
those are the only two
alternatives.
2014 OSG User School DHTC and Overlays
No shared file system
• As a side effect, you cannot expect a
globally shared file system
- It's just “yet another user requested service”
• You will have to deal with data explicitly
- Either using the HTC scheduler capabilities
(remember yesterday's lecture)
- Or, embed file transfer to and from
permanent storage into your own jobs
• More details tomorrow
2014 OSG User School DHTC and Overlays
The location problem
• Nodes in different locations need a way
to talk to each other
- This is what networks are for
• If your computation is mostly about CPU
cycles, with little input and output data
- Node location is not an issue at all
• If you have lots of data
- Remember, throughput is typically
inversely proportional with the distance
2014 OSG User School DHTC and Overlays
The data problem
• Transferring large amounts of data
over Wide Area Network
can take a lot of time
• You should try to compute
close to the data source and/or sink
- Network-wise
- More about this tomorrow
2014 OSG User School DHTC and Overlays
Bottom line
• For simple computation,
DHTC is very similar to HTC
• As soon as you require any
complex setup for your jobs,
you are in for a rough ride
- This includes large datasets
2014 OSG User School DHTC and Overlays
Infrastructure considerations
• DHTC is likely to give you access to
many more compute slots
• Which is mostly a good thing
- You get your results faster
• But could crash your HTC system, e.g.
- Can your storage system handle more
data traffic?
- Can the job scheduling system handle an
order of magnitude more nodes?
2014 OSG User School DHTC and Overlays
Infrastructure considerations
• DHTC is likely to give you access to
many more compute slots
• Which is mostly a good thing
- You get your results faster
• But could crash your HTC system, e.g.
- Can your storage system handle more
data traffic?
- Can the job scheduling system handle an
order of magnitude more nodes?
Hopefully not something
final users should deal with
but it is good to keep in mind.
2014 OSG User School DHTC and Overlays
Questions?
• Questions? Comments?
- Feel free to ask me questions later:
Igor Sfiligoi <isfiligoi@ucsd.edu>
• Upcoming sessions
- Where to get the needed resources
- glideinWMS – the OSG overlay software
- Hands-on exercises
- Tour
2014 OSG User School DHTC and Overlays
Beware the power
Courtesy of fanpop.com
2014 OSG User School DHTC and Overlays
Copyright statement
• Some images contained in this
presentation are the copyrighted
property of ToonClipart.
• As such, these images are being
used under a license from said entities,
and may not be copied or downloaded
without explicit permission from
ToonClipart.

More Related Content

Viewers also liked

VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld
 
Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondorIgor Sfiligoi
 
Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1Simon Kim
 
Augmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaAugmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaIgor Sfiligoi
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break pointsIgor Sfiligoi
 

Viewers also liked (6)

distcom-short-20140112-1600
distcom-short-20140112-1600distcom-short-20140112-1600
distcom-short-20140112-1600
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters
 
Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondor
 
Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1
 
Augmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaAugmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with Nirvana
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break points
 

Similar to Introduction to Distributed HTC and overlay systems - OSG User School 2014

Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...Dirk Petersen
 
You Don't Need IT To Do That - The World of Outsourcing and SaaS
You Don't Need IT To Do That - The World of Outsourcing and SaaSYou Don't Need IT To Do That - The World of Outsourcing and SaaS
You Don't Need IT To Do That - The World of Outsourcing and SaaSKyle James
 
USG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 DaysUSG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 DaysEric Sembrat
 
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid CloudUsing AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Clouddboze
 
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...Ashwin Reddy
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOpsJulien Pivotto
 
CephDay Argentina 2019
CephDay Argentina 2019CephDay Argentina 2019
CephDay Argentina 2019Alvaro Soto
 
How can i... reduce my backup window.
How can i... reduce my backup window.How can i... reduce my backup window.
How can i... reduce my backup window.Andrew Nicholson
 
Moodle self-hosting - some things to consider Mike Hughes, Amanda Doughty, ...
Moodle self-hosting - some things to consider  	Mike Hughes, Amanda Doughty, ...Moodle self-hosting - some things to consider  	Mike Hughes, Amanda Doughty, ...
Moodle self-hosting - some things to consider Mike Hughes, Amanda Doughty, ...Ireland & UK Moodlemoot 2012
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017AWS Chicago
 
Choosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the EnterpriseChoosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the EnterpriseXebiaLabs
 
SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise Shy Engelberg
 
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …mortardata
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklugdominion
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorIgor Sfiligoi
 

Similar to Introduction to Distributed HTC and overlay systems - OSG User School 2014 (20)

Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
 
You Don't Need IT To Do That - The World of Outsourcing and SaaS
You Don't Need IT To Do That - The World of Outsourcing and SaaSYou Don't Need IT To Do That - The World of Outsourcing and SaaS
You Don't Need IT To Do That - The World of Outsourcing and SaaS
 
USG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 DaysUSG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 Days
 
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid CloudUsing AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
 
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...
 
Going open source first
Going open source firstGoing open source first
Going open source first
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
CephDay Argentina 2019
CephDay Argentina 2019CephDay Argentina 2019
CephDay Argentina 2019
 
From 1 to 100
From 1 to 100From 1 to 100
From 1 to 100
 
How can i... reduce my backup window.
How can i... reduce my backup window.How can i... reduce my backup window.
How can i... reduce my backup window.
 
Moodle self-hosting - some things to consider Mike Hughes, Amanda Doughty, ...
Moodle self-hosting - some things to consider  	Mike Hughes, Amanda Doughty, ...Moodle self-hosting - some things to consider  	Mike Hughes, Amanda Doughty, ...
Moodle self-hosting - some things to consider Mike Hughes, Amanda Doughty, ...
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
Choosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the EnterpriseChoosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the Enterprise
 
SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise
 
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklug
 
Serving HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondorServing HTC Users in Kubernetes by Leveraging HTCondor
Serving HTC Users in Kubernetes by Leveraging HTCondor
 

More from Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

More from Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Introduction to Distributed HTC and overlay systems - OSG User School 2014

  • 1. Introduction to Distributed HTC and overlay systems Tuesday morning session Igor Sfiligoi <isfiligoi@ucsd.edu> University of California San Diego
  • 2. 2014 OSG User School DHTC and Overlays Logistical reminder • It is OK to ask questions - During the lecture - During the demos - During the exercises - During the breaks • If I don't know the answer, I will find someone who likely does
  • 3. 2014 OSG User School DHTC and Overlays High Throughput Computing • Yesterday you were introduced to HTC - Often called batch system computing - A paradigm that emphasizes maximizing the amount of useful computing over long periods of time Scheduler Jobs Batch slots
  • 4. 2014 OSG User School DHTC and Overlays Local HTC • What you have really experienced so far is local HTC • i.e. computing on dedicated cluster of dedicated resources - Managed by a single admin group - Co-located in a single location
  • 5. 2014 OSG User School DHTC and Overlays Is there anything else? • As you might expect, there are several insulated “local HTC” clusters installed around the world • And there are non-HTC systems out there, too
  • 6. 2014 OSG User School DHTC and Overlays Is there anything else? • As you might expect, there are several insulated “local HTC” clusters installed around the world • And there are non-HTC systems out there, too And you point is …???
  • 7. 2014 OSG User School DHTC and Overlays Just local HTC • You moved from a single PC - O(1) cores • To a local HTC cluster - Say, O(100) cores daily avg Great! I can have my results back in 1/100th of the time
  • 8. 2014 OSG User School DHTC and Overlays Just local HTC • You moved from a single PC - O(1) cores • To a local HTC cluster - Say, O(100) cores daily avg • But is it fast enough? It will still take me over a month to get the results back! The result of the 10 body simulation is very promising! I want to run a 100 body one!
  • 9. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do?
  • 10. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool - i.e. better priority compared to the other users of the same pool There are likely 100s of users But you better be doing something really important
  • 11. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool • Pay to get more resources bought into the pool - Great for long term needs  If you can afford it - But will not help you in the short term Getting new machines installed can take months!
  • 12. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool • Pay to get more resources bought • Get the needed resources somewhere else - i.e. not locally
  • 13. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool • Pay to get more resources bought • Get the needed resources somewhere else - i.e. not locally This is what this lecture is all about!
  • 14. 2014 OSG User School DHTC and Overlays Distributed HTC • A computing paradigm that aims at maximizing useful computation using any available resource located anywhere on the planet • As a corollary - Compute resources are owned and operated by several independent groups This place is huge!
  • 15. 2014 OSG User School DHTC and Overlays Implications of distributed computing • We will be dealing with multiple independent compute systems - That do not know about each other They are owned by several independent groups, remember?
  • 16. 2014 OSG User School DHTC and Overlays Implications of distributed computing • We will be dealing with multiple independent compute systems - That do not know about each other • No global HTC scheduler out of the box Scheduler
  • 17. 2014 OSG User School DHTC and Overlays Implications of distributed computing • We will be dealing with multiple independent compute systems - That do not know about each other • No global HTC scheduler out of the box - Will have to stitch them all together
  • 18. 2014 OSG User School DHTC and Overlays The naïve way • The simplest way is to partition your jobs and submit a subset to each cluster Scheduler Scheduler Scheduler
  • 19. 2014 OSG User School DHTC and Overlays The naïve way • The simplest way is to partition your jobs and submit a subset to each cluster • The drawbacks - You may need multiple user accounts - You may need several different tools - Doing a good partitioning is hard  Want proportional to the resources you will get - And what if those CPUs are not in a HTC setup at all?
  • 20. 2014 OSG User School DHTC and Overlays The naïve way • The simplest way is to partition your jobs and submit a subset to each cluster • The drawbacks - You may need multiple user accounts - You may need several different tools - Doing a good partitioning hard  Want proportional to the resources you will get - And what if those CPUs are not in a HTC setup at all? What's the alternative???
  • 21. 2014 OSG User School DHTC and Overlays Use an overlay system • Use a systems that looks and feels like a regular HTC to users - But has compute nodes all over the world I like the sound of this!
  • 22. 2014 OSG User School DHTC and Overlays Use an overlay system • Use a systems that looks and feels like a regular HTC to users - But has compute nodes all over the world I like the sound of this! But... didn't you just say there was no such thing???
  • 23. 2014 OSG User School DHTC and Overlays Use an overlay system • Use a systems that looks and feels like a regular HTC to users - But has compute nodes all over the world But... didn't you just say there was no such thing??? I said “out of the box” But one can create one.
  • 24. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • No single person cannot manage all the existing resources Scheduler
  • 25. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • But we can lease a subset of them - We discuss the how later
  • 26. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • But we can lease a subset of them • And instantiate a HTC system on them Scheduler
  • 27. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • But we can lease a subset of them • And instantiate a HTC system on them - Now we can schedule user jobs on them - Only “our” resources are considered Scheduler Jobs
  • 28. 2014 OSG User School DHTC and Overlays DHTC through an overlay sys • Just another HTC system - Well, almost - More details in a few slides Scheduler Jobs Cool!Cool!Cool!Cool!
  • 29. 2014 OSG User School DHTC and Overlays Overlay system ownership • Setting up an overlay a major task - Comparable with installing a dedicated cluster • Long term maintenance is also costly • Not something a final user would want to do
  • 30. 2014 OSG User School DHTC and Overlays Typical overlay sys operators • Existing HTC admins, e.g. - The UW HTC cluster can “overflow” into OSG - The UCSD operates one for local users • Scientific communities, e.g. - The CMS LHC experiment • The Open Science Grid itself - With the OSG Connect More on this later today
  • 31. 2014 OSG User School DHTC and Overlays Is DHTC really just HTC? • Even with overlays, there are some differences between DHTC and HTC • With or without overlays, the core reasons are: - Multiple independent HW operators - Not all resources are co-located
  • 32. 2014 OSG User School DHTC and Overlays The multiple owners problem • In the “Grid” world, the resource owner decides which Operating System, which OS services and which libraries to install - A way smaller problem in the “Cloud” world - But most of the current DHTC landscape is based on the Grid paradigm • Different clusters likely configured differently
  • 33. 2014 OSG User School DHTC and Overlays The multiple owners problem • In the “Grid” world, the resource owner decides which Operating System, which OS services and which libraries to install - A way smaller problem in the “Cloud” world - But most of the current DHTC landscape is based on the Grid paradigm • Different clusters likely configured differently • Even if you could get your pet library/service installed on some clusters, you cannot expect to get it installed everywhere
  • 34. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies in your compute jobs  Make them self-contained  Adapt to the running environment (e.g. multiple binaries, one per platform)  Do not use licensed software
  • 35. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies in your compute jobs  Make them self-contained  Adapt to the running environment (e.g. multiple binaries, one per platform)  Do not use licensed software Sounds like a lot of work!
  • 36. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies - Use only a subset of the resources  Restrict where your jobs can run  Your job will of course take longer to finish  May still get you the result sooner
  • 37. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies - Use only a subset of the resources  Restrict where your jobs can run  Your job will of course take longer to finish  May still get you the result sooner So long for DHTC!
  • 38. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies - Use only a subset of the resources  Restrict where your jobs can run  Your job will of course take longer to finish  May still get you the result sooner So long for DHTC! Unfortunately, those are the only two alternatives.
  • 39. 2014 OSG User School DHTC and Overlays No shared file system • As a side effect, you cannot expect a globally shared file system - It's just “yet another user requested service” • You will have to deal with data explicitly - Either using the HTC scheduler capabilities (remember yesterday's lecture) - Or, embed file transfer to and from permanent storage into your own jobs • More details tomorrow
  • 40. 2014 OSG User School DHTC and Overlays The location problem • Nodes in different locations need a way to talk to each other - This is what networks are for • If your computation is mostly about CPU cycles, with little input and output data - Node location is not an issue at all • If you have lots of data - Remember, throughput is typically inversely proportional with the distance
  • 41. 2014 OSG User School DHTC and Overlays The data problem • Transferring large amounts of data over Wide Area Network can take a lot of time • You should try to compute close to the data source and/or sink - Network-wise - More about this tomorrow
  • 42. 2014 OSG User School DHTC and Overlays Bottom line • For simple computation, DHTC is very similar to HTC • As soon as you require any complex setup for your jobs, you are in for a rough ride - This includes large datasets
  • 43. 2014 OSG User School DHTC and Overlays Infrastructure considerations • DHTC is likely to give you access to many more compute slots • Which is mostly a good thing - You get your results faster • But could crash your HTC system, e.g. - Can your storage system handle more data traffic? - Can the job scheduling system handle an order of magnitude more nodes?
  • 44. 2014 OSG User School DHTC and Overlays Infrastructure considerations • DHTC is likely to give you access to many more compute slots • Which is mostly a good thing - You get your results faster • But could crash your HTC system, e.g. - Can your storage system handle more data traffic? - Can the job scheduling system handle an order of magnitude more nodes? Hopefully not something final users should deal with but it is good to keep in mind.
  • 45. 2014 OSG User School DHTC and Overlays Questions? • Questions? Comments? - Feel free to ask me questions later: Igor Sfiligoi <isfiligoi@ucsd.edu> • Upcoming sessions - Where to get the needed resources - glideinWMS – the OSG overlay software - Hands-on exercises - Tour
  • 46. 2014 OSG User School DHTC and Overlays Beware the power Courtesy of fanpop.com
  • 47. 2014 OSG User School DHTC and Overlays Copyright statement • Some images contained in this presentation are the copyrighted property of ToonClipart. • As such, these images are being used under a license from said entities, and may not be copied or downloaded without explicit permission from ToonClipart.